Open virajbdeshpande opened 6 years ago
The client API docs may help clarify which arguments each method accepts:
http://illumina.github.io/pyflow/WorkflowRunner_API_html_doc/index.html
We could potentially add this for addWorkflowTask, but what are the semantics you're looking for in this case? Could the same thing be accomplished with os.chdir(path)
at the top of the added workflow instance?
Thanks.
Here is an example use case. You have a dataset of multiple samples (parent workflow) and you want to run multiple analysis for each sample in a different directory (subworkflow). Each analysis gets its own workflow (subsubworkflow) and subdirectory within the sample directory.
Let's say rootdir
is the directory where we run the script/parent workflow and the cwd
for the subworkflow is path
. Then the semantics for the usecase above will be as follows:
1) if I set cwd=path
when calling subworkflow, the working directory for the subsubworkflow should automatically be set to path
and not rootdir
unless changed by subworkflow using the cwd
argument. In short, any subworkflow should be oblivious of rootdir
and only inherits cwd
from its parent.
2) It is not directly clear to the user whether it is required to do os.chdir(rootdir)
at the end of the subworkflow or will the parent workflow continue to run in rootdir
. So having the cwd
encoded in an argument clarifies that the user does not need to switch back.
For point (1), in the current version, subsubworkflow still runs in the rootdir
even if I do os.chdir(path)
within subworkflow.
For point (2), I confirmed that the tasks enter a race condition when using os.chdir
on a local run. For example, here are two directory structures that get created by running the pyflow scripts twice:
RUN1: Correct structure ./2015-2802/fastq_cat ./2015-2802 ./2015-2799/fastq_cat ./2015-2799
RUN2: Incorrect structure ./2015-2802/fastq_cat ./2015-2802/2015-2799/fastq_cat ./2015-2802/2015-2799 ./2015-2802 ./2015-2799
Do you think this will be fixed any time soon? Alternatively, I can switch to using absolute paths everywhere within the script and only run shell commands for external tools through addTask(cwd)
. It is easier to write a bash script to deploy the Pyflow script separately for each sample, but that defeats the purpose of using Pyflow.
Currently, I can use "cwd" argument as shown in the cwd demo for "addTask", but it gives me an error "unexpected keyword argument" if I use it with "addWorkflowTask".