danforthcenter / plantcv

Plant phenotyping with image analysis
Mozilla Public License 2.0
661 stars 265 forks source link

cross-platform compatible command line scripts #616

Closed dschneiderch closed 1 year ago

dschneiderch commented 4 years ago

Is your feature request related to a problem? Please describe. I'm always frustrated when I move between windows and linux for running workflows because Windows doesn't recognize .py files as executable. My current solution is to figure out what system I'm on

if [ `uname -o` ==  'Msys' ]
then
bindir=$CONDA_PREFIX/Scripts
else
bindir=$CONDA_PREFIX/bin
fi

and then execute python $bindir/plantcv-workflow.py. I have this saved to a shell file (which you can now run on Windows!)

On unix-based systems you simply can execute with plantcv-workflow.py but the above with full path works too.

Describe the solution you'd like It turns out setuptools has a solution for this in setup.py that would be much smoother. You can create platform specific executable files Currently, setuptools.setup() looks like

setuptools.setup(...,
scripts=["plantcv-train.py", "plantcv-utils.py", "plantcv-workflow.py"]

which copies the python scripts to your path. However, if we used entry_points instead of scripts setuptools would make proper executables specific to the OS. This is the recommended apporach on packaging.python.org

I'd propose in the upper level plantcv folder creating a new subpackage called commandline and storing the scripts there as modules. Then instead of scripts= we could use

    entry_points={
        'console_scripts': [
            'plantcv-workflow=plantcv.commandline.plantcvworkflow:main',
            'plantcv-train=plantcv.commandline.plantcvtrain:main',
            'plantcv-utils=plantcv.commandline.plantcvutils:main'
        ],

One small issue is that module names (at least when when are called in entry_points can't contain - so they have to be renamed to plantcvworkflow.py etc. However, we can still create executable commands with -.

On unix-based systems there will be no practical change. you can still call plantcv-workflow or python $CONDA_PREFIX/bin/plantcv-workflow.py On windows using you can use plantcv-workflow or plantcv-workflow.exe. However we break backwards compatibility because python %CONDA_PREFIX%/Scripts/plantcv-workflow.py doesn't work anymore. Overall its a better experience esp since plantcv-workflow autoccompletes on Windows.

Describe alternatives you've considered Maybe we could additionally include scripts= to install the raw python scripts too? I don't understand this well enough yet. One caveat here is that we'd have to have both plantcv-workflow.py and plantcvworkflow.py due to issue of - described above.

Additional context If we can't make this backwards compatible then should this wait for PCV4? eitherway, I suggest we only include plantcv-workflow.py on Windows for PCV3.* since this is a bit of a pain and phase it out in PCV4.

dschneiderch commented 4 years ago

followup: it seems scripts is shadowed by entry_points so in order to keep plantcv-workflow.py on Windows we'd have to rename the entry points from plantcv-workflow to something like pcv-workflow

nfahlgren commented 4 years ago

I think this sounds good overall. It's a good thing, but we will have to include tests for each script module since they would be part of the package. There's some work to add the tests but it's good to make sure they are working of course.

We could potentially shorten the name of the subpackage to cli (command-line interface).

Within the subpackage the file names could just be workflow.py, utils.py, train.py, etc. Looks like it doesn't really matter since the name of the executable is independently defined.

One thought I have been saving for later (v4) is having just one script. Maybe either plantcv or plantcv-cli or something like that. Then have workflow, train, and utils be subcommands. You would have commands like this:

plantcv workflow --config my_config.json

plantcv utils json2csv -j pcv_output.json -c pcv_output

plantcv train naive_bayes_multiclass -f pixel_samples.txt -o model.txt -p

Etc. I don't have a strong preference for multiple scripts versus one with subcommands, so happy to go with what folks prefer.

dschneiderch commented 4 years ago

the benefit of the subcommands is you don't have to remember 3 different commands. if you just type plantcv then you can get help so i'd be in favor.

nfahlgren commented 4 years ago

plantcv is shorter, but not sure if it creates confusion because it's the name of the package too, so plantcv-cli could avoid that issue. What do you think?

dschneiderch commented 4 years ago

what elements of these scripts should get tested?

dschneiderch commented 4 years ago

potentially? plantcv-cli is fine since it will autocomplete

dschneiderch commented 4 years ago

the benefit of the subcommands is you don't have to remember 3 different commands. if you just type plantcv then you can get help so i'd be in favor.

subcommands also matches the way utils works so it'd be consistent

nfahlgren commented 4 years ago

I might have to do a little research/thinking on testing. I think unlike our normal modules we can't (completely) test the individual components of each program (i.e. we can't really test options() since it requires parsing command-line arguments, and main() relies on it, and so on). But I did find a pytest plugin that might work: https://pypi.org/project/pytest-console-scripts/

nfahlgren commented 4 years ago

By combining these into one program we will reduce a little bit of overlapping code between them. And luckily all the heavy lifting code is already being tested in the parallel, learn, and utils subpackages.

nfahlgren commented 1 year ago

Implemented in #1099 (PlantCV v4)