yogeshg commented 6 years ago

Motivation

Developers need to run several tools and setting these up could be a hassle. These tools include but are not limited to setting up the environment, running style checks, running tests etc. For different projects, these tools may have different default values.

Documentation is needed for these default values to allow users to run these tools. But we never read documentation, so an interactive script with default values and options to run these tools in a limited scope might be useful. Limited scope might mean only the files changed, different python or spark versions etc.

Tools like pylint, prospector etc. do a great job of giving users flexibility, but they are mostly run with the default values. They also do not give options to run on only currently touched files -- this defintion is also broad.

This is why we might want a script to help the user do everything they need and also guide them through the process through "--help" and autocomplete options. It should also be easy for us to maintain such a script and users to read it to see what goes on under the hood. Luckily for us python argh package uses parseargs and argscomplete packages to expose python functions on bash with great help and autocompletion funcitonality.

Demo

$ pip install -r python/dev-requirements.txt
## installs all requirements for development

$ ./python/run.py -h
## shows help for project
# positional arguments:
#   {pylint,prospector,unittest,nose,envsetup}
#     pylint              calls pylint with a limited set of keywords. run
#                         `pylint --help` for more details.
#     prospector          calls prospector with a limited set of keywords. run
#                         `prospector --help` for more details.
#     unittest            calls unittest with a limited set of keywords. run
#                         `python -m unittest --help` for more details.
#     nose                calls nosettest with a limited set of keywords. run
#                         `python -m nose --help` for more details.
#     envsetup            Prints out shell commands that can be used in terminal
#                         to setup the environment. This tool inspects the
#                         current environment, adds default values, and/or
#                         interactively asks user for values and prints all or
#                         the missing variables that need to be set. :param
#                         default: if default values should be set in this
#                         script or not :param interactive: if user should be
#                         prompted for values or not :param missing_only: if
#                         only missing variable should be printed :param
#                         verbose: if user should be guided or not
# 
# optional arguments:
#   -h, --help            show this help message and exit

$ ./python/run.py envsetup -d -i
## interactively setup or default the required environment variables
## this can be copied by user into their shell script or,

$ eval $(./python/run.py envsetup -d -i)
## directly used in the current bash

$ ./python/run.py pylint
## runs pylint with project specific rc-files on full python code path or "currently touched files"

$ ./python/run.py test
## runs all required tests or those for "currently touched files"

Tools

pylint is configured for variable names as both snake_case and camelCase
we can also write pylint plugins that check for public APIs and allow them to be camelCase while keeping everything else snake_case
prospector has pylint and some other tests like code complexity, we can start using these later
we currently use nose tests in run-tests.sh after setting some environment variables and specifying defaults
- we can run tests on a subtree of code
- but it is currently not possible to run tests on more than one subtree
- eg: we an run tests on RFormula or VectorAssembler or ml.feautres but to run both RFormula and VectorAssembler we need more than one commands
- we may also want to run tests on "currently touched files", "currently touched submodules" and so on to speed up development and run CI in stages
we currently setup the environment in run-tests.sh and some requirements for that are also requirements of pylint, all such can be included in envsetup

Misc:

We can specify commands and default arguments in run.py in a very pythonic way and this gets auto exposed while running the shell script without any more work.
we no longer need to read complicated bash files, if bash gets too complicated, it's easier to write that in python and understand that
argh generates help files from arguments
argcomplete generates autocomplete -- some config required.

codecov-io commented 6 years ago

Codecov Report

Merging #118 into master will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #118   +/-   ##
=======================================
  Coverage   85.38%   85.38%           
=======================================
  Files          33       33           
  Lines        1923     1923           
  Branches       44       44           
=======================================
  Hits         1642     1642           
  Misses        281      281

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9f05fc2...c6bdd8a. Read the comment docs.

yogeshg commented 6 years ago

This pr was split into #121 and #124

databricks / spark-deep-learning

Dev scripts #118

Motivation

Demo

Tools

Misc:

Codecov Report