biocore / pyqi

Tools for developing and testing command line interfaces in Python.
Other
9 stars 13 forks source link

initial workflow commit #245

Closed wasade closed 10 years ago

wasade commented 10 years ago

Initial Workflow object to support the refactoring of split libraries. It is likely there needs to be some expansion of the nomenclature and surrounding documentation.

The basic idea is that a workflow consists of a bunch of individual methods. These methods can be grouped if so pleased. Not all of the methods are necessary at runtime, depending on the options specified. Only those methods whose requirements are satisfied are executed. Methods in the workflow can be short circuited if a prior method sets Failed to True.

Method order, or method group order, can be set using the priority decorator.

requires supports at this time look up of specific options and values, as well as an arbitrary function that can check if the data has specific properties (e.g., in the case of split libraries, whether item has quality scores or not).

This perhaps is still a bit to meta. I have a very limited split libraries Workflow with a complementary pyqi Command but those are a few more days out.

The more this gets developed, the more I think this object can fit in for many scripts in QIIME, including bdiv and OTU picking. This centralizes and formalizes a lot of common logic (i.e., do this, then this, then this if condition xyz, then this if condition foo, etc).

There is some relatively easy to get at data level parallelism as well here whereby the parallelism can be pushed into the data generators used. Task level would be possible too, although there would be a lot of overhead and the GIL would likely get in the way. Optionally, this object could be "parallel" aware pretty easily too if the __call__ method paid respect to, say, a process Rank. The benefit being that the data generators are agnostic to the number of processors, but generalizing parallelism here could get touchy.

wasade commented 10 years ago

This was moved to scikit-bio