common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
332 stars 230 forks source link

CWLProv: cannot reproduce workflow runs with Directory input #1693

Open simleo opened 2 years ago

simleo commented 2 years ago

cwltool version: 3.1.20220628170238

Ran this workflow by @RenskeW with:

$ cwltool --provenance ro wf.cwl wf_job.yml

Then I tried reproducing the run and I got this error:

$ cd ro/workflow/
$ cwltool --debug packed.cwl primary-job.json 
INFO /home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/bin/cwltool 3.1.20220628170238
INFO Resolved 'packed.cwl' to 'file:///home/simleo/repos/cwlprov-provenance/prov_data_annotations/example2/ro/workflow/packed.cwl'
ERROR Input object failed validation:
Anonymous directory object must have 'listing' and 'basename' fields.
Traceback (most recent call last):
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/main.py", line 1307, in main
    initialized_job_order_object = init_job_order(
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/main.py", line 507, in init_job_order
    normalizeFilesDirs(job_order_object)
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/utils.py", line 474, in normalizeFilesDirs
    visit_class(job, ("File", "Directory"), addLocation)
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/utils.py", line 216, in visit_class
    visit_class(rec[d], cls, op)
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/utils.py", line 214, in visit_class
    op(rec)
  File "/home/simleo/git/workflow-run-crate/tools/cwlprov_to_crate/venv/lib/python3.8/site-packages/cwltool/utils.py", line 434, in addLocation
    raise ValidationException(
schema_salad.exceptions.ValidationException: Anonymous directory object must have 'listing' and 'basename' fields.

I also had problems with this other workflow. For each file in the input directory, it does a grep for the given pattern, converts the result to upper case and writes the final result to a file in the output directory.

$ cwltool --provenance ro grepucase.cwl grepucase-job.yml
$ head grepucase_in/*
==> grepucase_in/bar <==
the fat
brown bear
jumped over
the lazy whale

==> grepucase_in/foo <==
the quick
brown fox
jumped over
the lazy dog
$ head ucase_out/*
==> ucase_out/bar.out.out <==
THE LAZY WHALE

==> ucase_out/foo.out.out <==
THE LAZY DOG

In this case, running the workflow from the RO does not cause any errors, but the output directory is empty.

mr-c commented 2 years ago

@simleo Did you make any local changes? I wasn't able to reproduce with either cwltool 3.1.20220720142255 (unreleased dev version) or 3.1.20220628170238

simleo commented 2 years ago

@mr-c no changes. To double check, I've redone everything from scratch, here is the log:

[simleo@neuron:tmp]$ git clone git@github.com:RenskeW/cwlprov-provenance.git
Cloning into 'cwlprov-provenance'...
remote: Enumerating objects: 960, done.
remote: Counting objects: 100% (960/960), done.
remote: Compressing objects: 100% (687/687), done.
remote: Total 960 (delta 349), reused 827 (delta 222), pack-reused 0
Receiving objects: 100% (960/960), 35.03 MiB | 5.58 MiB/s, done.
Resolving deltas: 100% (349/349), done.
[simleo@neuron:tmp]$ cd cwlprov-provenance/
[simleo@neuron:cwlprov-provenance (main)]$ git checkout f5dd87a950eeaf7f96bd39dc218164832ff3cbea
Note: switching to 'f5dd87a950eeaf7f96bd39dc218164832ff3cbea'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at f5dd87a Create LICENSE.md
[simleo@neuron:cwlprov-provenance ((f5dd87a...))]$ python3 -m venv venv
[simleo@neuron:cwlprov-provenance ((f5dd87a...))]$ source venv/bin/activate
(venv) [simleo@neuron:cwlprov-provenance ((f5dd87a...))]$ pip install --upgrade pip
Collecting pip
  Downloading pip-22.2-py3-none-any.whl (2.0 MB)
     |████████████████████████████████| 2.0 MB 1.9 MB/s 
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
Successfully installed pip-22.2
(venv) [simleo@neuron:cwlprov-provenance ((f5dd87a...))]$ pip install cwltool==3.1.20220628170238
Collecting cwltool==3.1.20220628170238
  Using cached cwltool-3.1.20220628170238-py3-none-any.whl (1.3 MB)
Collecting ruamel.yaml<0.17.22,>=0.15
  Using cached ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting psutil>=5.6.6
  Using cached psutil-5.9.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284 kB)
Collecting shellescape<3.9,>=3.4.1
  Using cached shellescape-3.8.1-py2.py3-none-any.whl (3.1 kB)
Requirement already satisfied: setuptools in ./venv/lib/python3.8/site-packages (from cwltool==3.1.20220628170238) (44.0.0)
Collecting coloredlogs
  Using cached coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)
Collecting prov==1.5.1
  Using cached prov-1.5.1-py2.py3-none-any.whl (426 kB)
Collecting rdflib<6.2.0,>=4.2.2
  Using cached rdflib-6.1.1-py3-none-any.whl (482 kB)
Collecting schema-salad<9,>=8.2.20211104054942
  Downloading schema_salad-8.3.20220721194857-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 8.9 MB/s eta 0:00:00
Collecting argcomplete
  Using cached argcomplete-2.0.0-py2.py3-none-any.whl (37 kB)
Collecting pydot>=1.4.1
  Using cached pydot-1.4.2-py2.py3-none-any.whl (21 kB)
Collecting requests>=2.6.1
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 10.1 MB/s eta 0:00:00
Collecting pyparsing!=3.0.2
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting typing-extensions
  Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Collecting bagit>=1.6.4
  Using cached bagit-1.8.1-py2.py3-none-any.whl (35 kB)
Collecting mypy-extensions
  Using cached mypy_extensions-0.4.3-py2.py3-none-any.whl (4.5 kB)
Collecting python-dateutil
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting six>=1.9.0
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting lxml
  Downloading lxml-4.9.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.9/6.9 MB 21.1 MB/s eta 0:00:00
Collecting networkx
  Downloading networkx-2.8.5-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 21.0 MB/s eta 0:00:00
Collecting isodate
  Using cached isodate-0.6.1-py2.py3-none-any.whl (41 kB)
Collecting charset-normalizer<3,>=2
  Downloading charset_normalizer-2.1.0-py3-none-any.whl (39 kB)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.10-py2.py3-none-any.whl (139 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 139.2/139.2 kB 18.0 MB/s eta 0:00:00
Collecting idna<4,>=2.5
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2022.6.15-py3-none-any.whl (160 kB)
Collecting ruamel.yaml.clib>=0.2.6
  Using cached ruamel.yaml.clib-0.2.6-cp38-cp38-manylinux1_x86_64.whl (570 kB)
Collecting lockfile>=0.9
  Using cached lockfile-0.12.2-py2.py3-none-any.whl (13 kB)
Collecting mistune<0.9,>=0.8.1
  Using cached mistune-0.8.4-py2.py3-none-any.whl (16 kB)
Collecting CacheControl<0.13,>=0.11.7
  Using cached CacheControl-0.12.11-py2.py3-none-any.whl (21 kB)
Collecting humanfriendly>=9.1
  Using cached humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting msgpack>=0.5.2
  Using cached msgpack-1.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
Installing collected packages: shellescape, mypy-extensions, msgpack, mistune, lockfile, bagit, urllib3, typing-extensions, six, ruamel.yaml.clib, pyparsing, psutil, networkx, lxml, idna, humanfriendly, charset-normalizer, certifi, argcomplete, ruamel.yaml, requests, python-dateutil, pydot, isodate, coloredlogs, rdflib, CacheControl, schema-salad, prov, cwltool
Successfully installed CacheControl-0.12.11 argcomplete-2.0.0 bagit-1.8.1 certifi-2022.6.15 charset-normalizer-2.1.0 coloredlogs-15.0.1 cwltool-3.1.20220628170238 humanfriendly-10.0 idna-3.3 isodate-0.6.1 lockfile-0.12.2 lxml-4.9.1 mistune-0.8.4 msgpack-1.0.4 mypy-extensions-0.4.3 networkx-2.8.5 prov-1.5.1 psutil-5.9.1 pydot-1.4.2 pyparsing-3.0.9 python-dateutil-2.8.2 rdflib-6.1.1 requests-2.28.1 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.6 schema-salad-8.3.20220721194857 shellescape-3.8.1 six-1.16.0 typing-extensions-4.3.0 urllib3-1.26.10
(venv) [simleo@neuron:cwlprov-provenance ((f5dd87a...))]$ cd prov_data_annotations/example2/
(venv) [simleo@neuron:example2 ((f5dd87a...))]$ cwltool --provenance ro wf.cwl wf_job.yml
INFO /tmp/cwlprov-provenance/venv/bin/cwltool 3.1.20220628170238
INFO [cwltool] /tmp/cwlprov-provenance/venv/bin/cwltool --provenance ro wf.cwl wf_job.yml
INFO Resolved 'wf.cwl' to 'file:///tmp/cwlprov-provenance/prov_data_annotations/example2/wf.cwl'
INFO [provenance] Adding to RO file:///tmp/cwlprov-provenance/prov_data_annotations/example2/data/sabdab_summary_all_20220527.tsv
INFO [provenance] Adding to RO file:///tmp/cwlprov-provenance/prov_data_annotations/example2/data/7mb7.cif
INFO [provenance] Adding to RO file:///tmp/cwlprov-provenance/prov_data_annotations/example2/data/7zxf.cif
INFO [provenance] Adding to RO file:///tmp/cwlprov-provenance/prov_data_annotations/example2/data/merged.csv
INFO [workflow ] start
INFO [workflow ] starting step date2_step
INFO [step date2_step] start
INFO [job date2_step] /tmp/l0wnubc6$ date \
    -r \
    /tmp/qn8jjvjk/stg9a0f33dc-6a6d-4bfa-be86-ffc0558b1cbf/7mb7.cif
Mon Jul 25 10:56:10 CEST 2022
INFO [job date2_step] completed success
INFO [step date2_step] start
INFO [job date2_step_2] /tmp/mdnug539$ date \
    -r \
    /tmp/pmqc6ss4/stg72fbd931-ed8f-4729-8b2d-389ab4a47ef2/7zxf.cif
Mon Jul 25 10:56:10 CEST 2022
INFO [job date2_step_2] completed success
INFO [step date2_step] completed success
INFO [workflow ] starting step date_step
INFO [step date_step] start
INFO [job date_step] /tmp/1tn0zuji$ date \
    -r \
    /tmp/9r_8x4nb/stgc6bf97ac-5cd2-4378-9f99-108dd3155f64/sabdab_summary_all_20220527.tsv
Mon Jul 25 10:56:10 CEST 2022
INFO [job date_step] completed success
INFO [step date_step] completed success
INFO [workflow ] starting step echo_step
INFO [step echo_step] start
INFO [job echo_step] /tmp/fldeko_h$ echo \
    /tmp/zjg9cx6q/stg444694ce-f643-4147-b1e8-c852bf4b87cb/sabdab_summary_all_20220527.tsv \
    /tmp/zjg9cx6q/stg36c44a14-25d4-46af-a190-94f4d2b8dced/pdb_directory
/tmp/zjg9cx6q/stg444694ce-f643-4147-b1e8-c852bf4b87cb/sabdab_summary_all_20220527.tsv /tmp/zjg9cx6q/stg36c44a14-25d4-46af-a190-94f4d2b8dced/pdb_directory
INFO [job echo_step] completed success
INFO [step echo_step] completed success
INFO [workflow ] completed success
/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/rdflib/plugins/serializers/nt.py:36: UserWarning: NTSerializer always uses UTF-8 encoding. Given encoding was: None
  warnings.warn(
{}
INFO Final process status is success
INFO [provenance] Finalizing Research Object
INFO [provenance] Research Object saved to /tmp/cwlprov-provenance/prov_data_annotations/example2/ro
(venv) [simleo@neuron:example2 ((f5dd87a...))]$ cd ro/workflow/
(venv) [simleo@neuron:workflow ((f5dd87a...))]$ cwltool --debug packed.cwl primary-job.json
INFO /tmp/cwlprov-provenance/venv/bin/cwltool 3.1.20220628170238
INFO Resolved 'packed.cwl' to 'file:///tmp/cwlprov-provenance/prov_data_annotations/example2/ro/workflow/packed.cwl'
ERROR Input object failed validation:
Anonymous directory object must have 'listing' and 'basename' fields.
Traceback (most recent call last):
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/main.py", line 1307, in main
    initialized_job_order_object = init_job_order(
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/main.py", line 507, in init_job_order
    normalizeFilesDirs(job_order_object)
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/utils.py", line 474, in normalizeFilesDirs
    visit_class(job, ("File", "Directory"), addLocation)
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/utils.py", line 216, in visit_class
    visit_class(rec[d], cls, op)
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/utils.py", line 214, in visit_class
    op(rec)
  File "/tmp/cwlprov-provenance/venv/lib/python3.8/site-packages/cwltool/utils.py", line 434, in addLocation
    raise ValidationException(
schema_salad.exceptions.ValidationException: Anonymous directory object must have 'listing' and 'basename' fields.