DataBiosphere / toil

A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
http://toil.ucsc-cgl.org/.
Apache License 2.0
893 stars 241 forks source link

Can't pickle cwltool:ProcessGenerator #4404

Open jfennick opened 1 year ago

jfennick commented 1 year ago

https://github.com/common-workflow-language/cwltool/blob/main/docs/processgen.rst

Toil cannot pickle a cwltool:ProcessGenerator object. This is presumably a simple fix; it appears that one additional clause needs to be inserted into the remove_pickle_problems function. https://github.com/DataBiosphere/toil/blob/master/src/toil/cwl/cwltoil.py#L2663 (I attempted to do this myself, without success.)

Attached is a minimal reproducible example: pytoolgen.txt zing.txt


INFO /Users/jakefennick/miniconda3/envs/base/bin/cwltool 3.1.20221201130942
INFO Resolved 'pytoolgen.txt' to 'file:///Users/jakefennick/pytoolgen.txt'
INFO [job 9017565b-d135-4bde-82b1-4670ffcb89a6] /private/tmp/docker_tmpf4zqaizr$ python \
    inp.py > /private/tmp/docker_tmpf4zqaizr/main.cwl
INFO [job 9017565b-d135-4bde-82b1-4670ffcb89a6] completed success
INFO [job main.cwl] /private/tmp/docker_tmprwmwgvq9$ echo \
    blurf
blurf
INFO [job main.cwl] completed success
{
    "runProcess": {
        "location": "file:///Users/jakefennick/main.cwl",
        "basename": "main.cwl",
        "class": "File",
        "checksum": "sha1$0cbf24b48a78fb6559954b959da950120aeb514f",
        "size": 97,
        "path": "/Users/jakefennick/main.cwl"
    }
}
INFO Final process status is success

toil-cwl-runner --enable-dev --enable-ext pytoolgen.txt zing.txt
[2023-03-08T11:50:32-0700] [MainThread] [I] [cwltool] Resolved 'pytoolgen.txt' to 'file:///Users/jakefennick/pytoolgen.txt'
[2023-03-08T11:50:33-0700] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 new
[2023-03-08T11:50:33-0700] [MainThread] [I] [toil.job] Processing job 'CWLJob' pytoolgen.txt kind-CWLJob/instance-brwrvhot v0
Traceback (most recent call last):
  File "/Users/jakefennick/miniconda3/envs/base/bin/toil-cwl-runner", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/jakefennick/miniconda3/envs/base/lib/python3.11/site-packages/toil/cwl/cwltoil.py", line 3794, in main
    outobj = toil.start(wf1)
             ^^^^^^^^^^^^^^^
  File "/Users/jakefennick/miniconda3/envs/base/lib/python3.11/site-packages/toil/common.py", line 1026, in start
    rootJobDescription = rootJob.saveAsRootJob(self._jobStore)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakefennick/miniconda3/envs/base/lib/python3.11/site-packages/toil/job.py", line 2586, in saveAsRootJob
    self._saveJobGraph(jobStore, saveSelf=True)
  File "/Users/jakefennick/miniconda3/envs/base/lib/python3.11/site-packages/toil/job.py", line 2559, in _saveJobGraph
    job.saveBody(jobStore)
  File "/Users/jakefennick/miniconda3/envs/base/lib/python3.11/site-packages/toil/job.py", line 2455, in saveBody
    pickle.dump(self, fileHandle, pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'Loader.__init__.<locals>.<lambda>'```

┆Issue is synchronized with this [Jira Story](https://ucsc-cgl.atlassian.net/browse/TOIL-1300)
┆Issue Number: TOIL-1300
mr-c commented 1 year ago

This needs a fix on the cwltool side, sorry!

mr-c commented 1 year ago

Oh, I think this was already fixed in cwltool 3.1.20230302145532 ; can you try that out and see if you still have that error?

jfennick commented 1 year ago

I see. I was installing toil using pip install toil[cw], and the latest release does not have this fix:

https://github.com/DataBiosphere/toil/blob/releases/5.9.2/requirements-cwl.txt

However, in a brand new conda environment, installing from source using

pip install "git+https://github.com/DataBiosphere/toil.git#egg=toil[cwl]"

still gives the same error.

(pytoolgen) jakefennick$ cwltool --version
/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/bin/cwltool 3.1.20230302145532
(pytoolgen) jakefennick$ toil --version
5.10.0a1
(pytoolgen) jakefennick$ toil-cwl-runner --enable-dev --enable-ext pytoolgen.txt zing.txt
[2023-03-15T15:51:18-0600] [MainThread] [I] [cwltool] Resolved 'pytoolgen.txt' to 'file:///Users/jakefennick/pytoolgen.txt'
[2023-03-15T15:51:19-0600] [MainThread] [I] [toil.job] Saving graph of 1 jobs, 1 new
[2023-03-15T15:51:19-0600] [MainThread] [I] [toil.job] Processing job 'CWLJob' pytoolgen.txt kind-CWLJob/instance-xu9x5q0k v0
Traceback (most recent call last):
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/bin/toil-cwl-runner", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/lib/python3.11/site-packages/toil/cwl/cwltoil.py", line 3780, in main
    outobj = toil.start(wf1)
             ^^^^^^^^^^^^^^^
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/lib/python3.11/site-packages/toil/common.py", line 1040, in start
    rootJobDescription = rootJob.saveAsRootJob(self._jobStore)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/lib/python3.11/site-packages/toil/job.py", line 2568, in saveAsRootJob
    self._saveJobGraph(jobStore, saveSelf=True)
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/lib/python3.11/site-packages/toil/job.py", line 2541, in _saveJobGraph
    job.saveBody(jobStore)
  File "/Users/jakefennick/mambaforge-pypy3/envs/pytoolgen/lib/python3.11/site-packages/toil/job.py", line 2438, in saveBody
    pickle.dump(self, fileHandle, pickle.HIGHEST_PROTOCOL)
AttributeError: Can't pickle local object 'Loader.__init__.<locals>.<lambda>'