DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
900 stars 240 forks source link

return Directory as output returns a array of files using cwl #1459

Closed gijzelaerr closed 7 years ago

gijzelaerr commented 7 years ago

hi! I'm trying to write a cwl workflow that outputs a directory:

https://github.com/gijzelaerr/vermeerkat-cwl/blob/master/h5toms.cwl#L16 https://github.com/gijzelaerr/vermeerkat-cwl/blob/dir_output_toil_bug/workflow.cwl#L9

using cwlref-runner this works correctly, it results in a directory in my output folder. But using toil it will return a flat list of files, it loses the nested structure. Not sure if i'm doing it wrong, but the cwlref-runner behavior is what i would expect and want.

thanks!

cket commented 7 years ago

@chapmanb do you have any ideas about this?

chapmanb commented 7 years ago

Sorry about missing this earlier. Currently directory inputs/outputs aren't supported by Toil. Here's a description of things it doesn't support in 1.0:

http://toil.readthedocs.io/en/releases-3.5.x/running.html#running-cwl-workflows

It would be great to add but I don't have the cwltool/Toil expertise to know how to implement it.

humburg commented 7 years ago

Support for directory inputs/outputs would indeed be great.

stefanoberri commented 7 years ago

I would like to give it a go at implementing the Directory support. I am a bit worried that @chapmanb says " I don't have the cwltool/Toil expertise to know how to implement it". I am new to both tools as well, but I am happy to dive in. Help and guidance is welcome.

This issue is also reported in https://github.com/BD2KGenomics/toil/issues/1656

stefanoberri commented 7 years ago

I have started working on these issues. forked and created branch https://github.com/stefanoberri/toil/tree/issues/1459-1656-cwl-directories

stefanoberri commented 7 years ago

I have removed a couple of line that prevented Toil from proceed in case input was a Directory and it seem to be able to read the directory correctly. That seems too easy and I am not sure why the check was introduced in the first place by @chapmanb Any Ideas? Maybe this got automatically fixed byt adding support for InitialWorkDirRequirements #1677 ?

I will now look into creating directories.

chapmanb commented 7 years ago

Stefano; Thanks for looking at this. I added a lot of checks for things I didn't know how to test and ensure worked, so it could have been an necessarily conservative check. I had to reduce the complexity somewhat to get most of it ported, sorry if it was blocking you.

If things work and it passes the CWL test suite then it's no problem with me to remove that.

As an FYI, @tetron is planning to do work on revamping and better integrating cwltoil with cwltool changes and will be tackling directory support as part of this. He might be worth coordinating with as you move forward so you don't duplicate work. Thanks again.

ejacox commented 7 years ago

I believe that @tetron was going to address this issue, too. Any comments @tetron?

stefanoberri commented 7 years ago

Thank you @chapmanb for the reply. The current code in https://github.com/stefanoberri/toil/tree/issues/1459-1656-cwl-directories (commit be7cf8f1) has a test for Directories as input that is passing and one with Directory as output that is failing. I will contact @tetron to cohordinate the work if possible.

The two tests are run with

make test tests=src/toil/test/cwl/cwlTest.py::CWLDirTest TOIL_DOCKER_REGISTRY=""

tetron commented 7 years ago

See https://github.com/BD2KGenomics/toil/pull/1810

tetron commented 7 years ago

1810 is merged, closing this issue.