cgat-developers / ruffus

CGAT-ruffus is a lightweight python module for running computational pipelines
MIT License
173 stars 34 forks source link

Circumvent custom required args when ruffus.cmdline args are set #56

Closed mbiokyle29 closed 9 years ago

mbiokyle29 commented 9 years ago

Hello! First off thank you for all of your hard work with ruffus! It has been an exceptional tool to use and has already saved me hours of time! I have a quick question (if this is not the proper place for this please let me know where is)

I am using the ruffus.cmdline package to wrap up my pipelines to command line scripts. I have specified some options as required as such:

# Program arguments
parser.add_argument("--size", help="Fullpath to size file", required=True)
parser.add_argument("--gtf", help="Fullpath to gtf file", required=True)

I want to have these as required options since the pipeline cannot run without them. The issue I then have is that if I just wish to use the built in options like --just_print and --flowchart I am forced to specify all the required arguments that I set.

Is there anyway to circumvent/get around this? ( Would be happy to fork, look into this and submit a pull request if thats something youd be interested in)

Thanks again! -Kyle

bunbun commented 9 years ago

This is really a limitation of argparse

Required arguments can be ignore when some arguments are present. argparse doesn't support that sort of state-machine / modular cleverness.

I had a quick read through of argparse to see whether a solution could be hacked.

1) The straightforward way would normally be to emulate what the --version or --help options do, i.e. perform some action (display the help message / version string), and then bomb out of further argument processing, ignoring the required command or indeed any other possible errors. However, this doesn't work in our case because --just_print or --flowchart depend on other command line arguments such as --verbose, --target_tasks etc.

2) At the bottom of argparse.py::_parse_known_args(), whenever there are missing requires

    msg = _('one of the arguments %s is required')
    self.error(msg % ' '.join(names))

, this results in a call to

    def error(self, message):
        ...

We can therefore subclass ArgumentParser.error() and ignore errors when message contains "is required" and namespace contains just_print or flowchart. Unfortunately, the set of parsed arguments (namespace) are not forwarded to _parse_known_args()...

More hacking is required: perhaps by subclassing _StoreTrueAction for --just_print and --flowchart so they store the results somewhere accessible from ArgumentParser.error(). Another way is to override the argparse._parse_known_args() (250 lines!)

3) Hack argparse.py by first manually unsetting the required flag all mandatory options, calling argparse.parse_args(), and seeing if --just_print or --flow_chart are called. Otherwise, we restore all the required flags back on, and call argparse.parse_args once more. This last bit belongs outside Ruffus.

Do you have a better idea of how to do this?

If it turns out to be too involved, it might be better to turn off requires and do the error checking yourself.

I suspect that no general solution will be found which would fit inside Ruffus but please prove me wrong!

Leo

mbiokyle29 commented 9 years ago

Leo,

Thanks for your very complete response!

I do wonder if we could perhaps sneak our own subclass of ArgumentParser into ruffus.cmdline and include another argument to the add_argument method. Something like necessary indicating that this argument is necessary if the pipeline will actually be ran.

We could then just implement some modified checking logic on the necessary values in our subclassed parse_args() method.

Please let me know what your thoughts are on this, I would be happy to work on this! Thanks again for making ruffus!

-Kyle

bunbun commented 9 years ago

I am wondering whether that is indeed a good idea of general use to the wider Ruffus community. Thinking this through: I realise I always use exactly the same command line arguments for --just_print --flow_chart, --recreate_database etc. That seems like a good practice, otherwise you have no idea whether the pipeline you are just observing or printing out does indeed correspond to the pipeline you run (with a different set of parameters).

I am still open to persuasion but I am not convinced yet.

On top of that , the whole idea of the hands-off ruffus cmdline support is that ruffus users can use their own argparse derivatives and drop them into ruffus. Would you want to try that first. I am sure that if it is useful, it would be of great interest to the wider python community and not just Ruffus users.

Leo