DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
894 stars 241 forks source link

CWL in toil : selecting a file/dir from directory listing with symlinking #3371

Closed jfouret closed 3 years ago

jfouret commented 3 years ago

Hi,

I had an issue using CWL in Toil.

See https://github.com/common-workflow-language/cwltool/issues/1391 for the problem description.

Shortly, when I want to select a file from the listing of an input directory and use it as output in a first step and as input for a second step - the expected behavior would be to symlink the selected file - however, in toil, this file is copied.

I found different issues that seem to be related but none seem to really cover the issue:

┆Issue is synchronized with this Jira Task ┆Issue Number: TOIL-746

DailyDreaming commented 3 years ago

Hmmm... this looks like it may have have recently been fixed in cwltool so we might just need to update to the latest fixed cwl version. I'll pull it in and hopefully that's the case. Thanks for submitting the issue @jfouret .

jfouret commented 3 years ago

Following the example given in common-workflow-language/cwltool#1391 with toil built from 4e7d12a65793c4a4c1c4e2b61109c19ad2756f02

We get

(venv) jfouret@compute-dedicated:/home/users/jfouret/toil_test$ tree cache
cache
└── test
    ├── files
    │   ├── for-job
    │   │   ├── kind-CWLJob
    │   │   │   ├── instance-9xrazzzu
    │   │   │   │   ├── file-39463274ab7c4399ae44a9f697c5e6cf
    │   │   │   │   │   └── ea46a9dcea810a12a6c69b4c46519706cc842b94 -> /home/users/jfouret/toil_test/outdir/ea46a9dcea810a12a6c69b4c46519706cc842b94
    │   │   │   │   └── file-e935d45165de427da033422b59244d9b
    │   │   │   │       └── hello1.txt
    │   │   │   └── instance-psppefhp
    │   │   ├── kind-CWLWorkflow
    │   │   │   └── instance-ovvw6aew
    │   │   └── kind-ResolveIndirect
    │   │       └── instance-6z24u13c
    │   ├── no-job
    │   │   ├── file-3e6b1dc321e74298b1a9582d4558b0e9
    │   │   ├── file-5f008298a2b54b749a19790648c5ead8
    │   │   ├── file-990522e59daa4c4085e626981384f6f7
    │   │   ├── file-a41ca6c42bea488fad3f968bd9c32a71
    │   │   │   └── stream
    │   │   ├── file-b11b54c41d304b5990e64566c18cec26
    │   │   │   └── stream
    │   │   └── file-efec003a1c3a4e098009a4cc3436af47
    │   └── shared
    │       ├── config.pickle
    │       ├── environment.pickle
    │       ├── pid.log
    │       ├── rootJobReturnValue
    │       ├── rootJobStoreID
    │       └── succeeded.log
    ├── jobs
    │   ├── kind-CWLJob
    │   ├── kind-CWLWorkflow
    │   └── kind-ResolveIndirect
    └── stats
        ├── stats39f7f231ae1b4777bd580481b2bb5efc.new
        ├── stats5937c7696e3240d7abd86c22d83028a0.new
        ├── stats5ec248fae2cb4aff9f8ed91315cbc248.new
        ├── stats9e2d31b092044d0fb703e7db1d9e723b.new
        └── statsb3ab7d54f5b746c4a2b8f0b3ddd9e47e

Before the problem was we were getting the following in addition :

cache
[...]
├── test_dir
│   ├── hello1.txt
│   └── hello2.txt
[...]

Thanks !

mr-c commented 3 years ago

Glad to hear it @jfouret !