common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
332 stars 230 forks source link

Output files don't end up in current working directory when run from python #1195

Closed EricBoix closed 5 years ago

EricBoix commented 5 years ago

Expected Behavior

Output files should end up in the invocation directory when a workflow (docker based) is run as a module just as when the same workflow is run with the cwl-runner.

Actual Behavior

No output files in the invocation directory (actually where do they end up at all?)

Workflow Code

For convenience, all the files required to reproduce this issue can be found within the attached outdir_in_python.zip zipfile (proceed with the withheld Readme.md).

The following is an extract from those files:

Now proceed with the Installation of dependencies

virtualenv -p python3 venv
source venv/bin/activate
pip3 install cwltool
cwlref-runner                  # for cwl-runner
pip3 install cwltool[deps]
pip3 install pyyaml            # for the python script

Prepare the input content (this is part of the user guide):

touch hello.txt && tar -cvf hello.tar hello.txt

Now when running the with tar.cwl with the cwl-runner engine an output file in the current working directory gets produced (in accordance with the workflow):

rm -f hello.txt      # easier to check existence of output than its time
cwl-runner tar.cwl tar-job.yml
ls -al hello.txt     # Yep it is here

But when invoked from python, the output file does not "appear" in the current working directory:

rm -f hello.txt
python tar.py
ls -al hello.txt     # ls: hello.txt: No such file or directory

Full Traceback

(venv): python tar.py
python tar.py
Resolved 'tar.cwl' to 'file:///private/tmp/outdir_in_python/tar.cwl'
[job tar.cwl] /private/var/folders/4f/l5svz0t119l_qhq9l7qnqstm0000gn/T/vnpvpywi$ tar \
    --extract \
    --file \
    /private/var/folders/4f/l5svz0t119l_qhq9l7qnqstm0000gn/T/tmpzimdaobv/stg7f6f4d45-e8f4-4886-8015-680129b0b4be/hello.tar
[job tar.cwl] completed success
{
    "example_out": {
        "basename": "hello.txt",
        "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "class": "File",
        "http://commonwl.org/cwltool#generation": 0,
        "location": "file:///private/var/folders/4f/l5svz0t119l_qhq9l7qnqstm0000gn/T/vnpvpywi/hello.txt",
        "nameext": ".txt",
        "nameroot": "hello",
        "size": 0
    }
}
Where did the output file hello.txt go ?

A clue might be the location value of the result that differs from the one displayed when running with cwl-runner.

Note: I tried to adapt the python script behavior by explicitly setting an "outdir" directory. Alas I was unable to find any documentation of example illustrating how to set an outdir in the python scripting context. Damned, no luck on this one...

Your Environment

EricBoix commented 5 years ago

Am I using the wrong site/mode for reporting an issue ? Is the community elsewhere ? Is there something missing in my issue description ? Am I misusing cwltool runner ? Are there some other practicals means to run the same workflow on a set of inputs besides resolving to scripting with shell ? Are there other python engines ? I really wonder how cwltool users (or more generaly cwl users) easily realize such a central task (running the same workflow on a set of inputs ) from the command line ?

mr-c commented 5 years ago

how cwltool users (or more generaly cwl users) easily realize such a central task (running the same workflow on a set of inputs ) from the command line ?

You can have a workflow as a step in another workflow and then scatter over arrays of inputs.

EricBoix commented 5 years ago

Thanks a lot @mr-c . Would you recommend a pedagogical online example ?

Concerning the above issue, would you consider it as a current limitation of python-cwltool ? As a bug ? Is there a way to specify the output dir from python ?

tetron commented 5 years ago

Try this:

    runtime_context = RuntimeContext()
    runtime_context.outdir = '/destination'
    fac = cwltool.factory.Factory(runtime_context=runtime_context)
    logana = fac.make("tar.cwl")
EricBoix commented 5 years ago

Thanks a lot @tetron . This does the trick !

For the record (and some details) here is what I had to add to my above mentioned tar.py python script in order to the get the workflow output placed in the invocation directory

import cwltool.context

...

# Specify the output directory as being the current working directory:
runtime_context = cwltool.context.RuntimeContext()
runtime_context.outdir = os.getcwd()

# Loading the workflow with the help of an ad-hoc factory:
fac = cwltool.factory.Factory(runtime_context=runtime_context)

...

I consider this issue as neatly solved and thus dare to close it (although I would appreciate if @mr-c were to provide some online pedagogical reference on how to have a workflow as a step in another workflow and then scatter over arrays of inputs ;) ).

Thanks again to both of you for your support.

mr-c commented 5 years ago

@EricBoix The short answer is to combine https://www.commonwl.org/user_guide/22-nested-workflows/index.html with https://www.commonwl.org/user_guide/23-scatter-workflow/index.html :-)

EricBoix commented 5 years ago

Wonderful ! Thanks @mr-c .