Confirmation of parameters used in fw_queue.py

tahorst commented 6 years ago

I think it would be good to show a confirmation of which environmental variables were used in setting parameters in fw_queue.py. Right now if there is a misspelled or unrecognized variable, it will silently not get set. We might also want to move to using an arg parser like Jerry set up for the manual runscripts to simplify things a bit.

jmason42 commented 6 years ago

I think argparse is definitely the way to go. There's really no good way to detect typos in environmental variables. Additionally, large sets of arguments could be stored in and passed as JSON or YAML files.

1fish2 commented 6 years ago

I'm inclined to:

restructure the functionality into classes, decoupling details like task dependency links from filename construction
use argparse to put a new command line interpreter on it
make fw_queue.py do its environment variable command line parsing for compatibility with existing scripts, then call on the new classes
other code could translate JSON or YAML or Python literals into calls on the new classes.

If fw_queue echoed the environment settings it read, would it be perspicuous to tell what values got ignored because they were misspelled?

Trying to print unrecognized environment variable names would need a stop list of variables to ignore. I have ≈100 environment variables set on Sherlock, nearly as many on macOS, and little overlap.

jmason42 commented 6 years ago

@1fish2: That all sounds good, but I really see no reason to maintain a mixture of CLI and environmental variable parsing (unless I'm drastically underestimating the amount of code dependent on that interface). If a hard break isn't practical or desirable, then I'd be in favor of a short deprecation window (at most two weeks) where everyone gets a chance to move to the new interface.

More towards @tahorst's original goal: I suppose you could pull the names of all environmental variables, strip out those used by the model, strip out a system-specific list, and then measure the edit distance between all unrecognized environmental variables and all unused (but valid) fw_queue.py variables. Then, if the edit distance is short e.g. a couple characters, throw an exception/print a warning/ask for confirmation. If there's a serviceable Python library for measuring edit distance, this wouldn't be too difficult. (If there isn't, I have some Cython-compiled code for dynamic time warping that could be rewritten to operate on strings. Wouldn't be too hard.)

1fish2 commented 6 years ago

A future PR that implements a new CLI (command line interface) would update scripts in the repo that currently call fw_queue.py.

Do people have more scripts that call fw_queue.py which they don't want to break?

If people are OK with updating those scripts when the CLI is ready, there's no need to even implement a backward-compatibility environment variable style fw_queue.py.

jmason42 commented 6 years ago

There are a couple scripts that call fw_queue.py. Otherwise I think most of us tend to use the terminal history or some uncommitted reference file to store execution parameters - obviously, these aren't great solutions and suggest an unaddressed need. Anyway, there would be some time needed to adapt to the new interface but like the analysis script CLIs I think it would be a welcome change.

tahorst commented 6 years ago

I think the simplest thing would be checking to see if any of the environmental variables we care about are in the environment and printing those. We usually only pass a handful of variables so it would be easy to see if something isn't on that list instead of checking all of the environmental variables.

@jmason42 are there any scripts other than the ones we just made for the paper? I think everyone just uses fw_queue.py from the command line. I don't think we'd need to maintain backwards compatibility.

jmason42 commented 6 years ago

I think the simplest thing would be checking to see if any of the environmental variables we care about are in the environment and printing those. We usually only pass a handful of variables so it would be easy to see if something isn't on that list instead of checking all of the environmental variables.

Printing details about what is getting run is a good idea, but I think this wouldn't scale very well (as far as catching typos, etc.). Above ~5 options I'd probably start to make mistakes.

@jmason42 are there any scripts other than the ones we just made for the paper? I think everyone just uses fw_queue.py from the command line. I don't think we'd need to maintain backwards compatibility.

Good point - any such files are probably not in the repo. For development purposes I tend to write a shell script for execution, but I generally don't commit those. I do think others (e.g. @eagmon, maybe @heejochoi) keep private documents with shell commands stored for easy copy/paste.

CovertLab / wcEcoli

Confirmation of parameters used in fw_queue.py #235