common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 230 forks source link

Error "RuntimeError: close method on generator classes unimplemented" is reported when "--parallel" is used #1658

Closed kongxiangya closed 2 years ago

kongxiangya commented 2 years ago

workflow:

class: Workflow
cwlVersion: v1.0
id: test
label: test
$namespaces:
  sbg: 'https://www.sevenbridges.com/'
inputs:
  - id: sample
    type: 'string[]?'
    default:
      - sample1
    'sbg:x': 0
    'sbg:y': 532.4375
  - id: fastq1
    type: 'File[]'
    default:
      - class: File
        path: ./sample1.raw_1.fq.gz
    'sbg:x': 0
    'sbg:y': 853.015625
  - id: fastq2
    type: 'File[]?'
    default:
      - class: File
        path: ./sample1.raw_2.fq.gz
    'sbg:x': 0
    'sbg:y': 746.15625
outputs: []
steps:
  - id: cutadaptparallel_v2
    in:
      - id: fastq1
        source:
          - fastq1
      - id: fastq2
        source:
          - fastq2
      - id: sample
        source:
          - sample
    out:
      - id: outfastq1
      - id: outfastq2
    run:
      class: Workflow
      cwlVersion: v1.0
      id: cutadaptparallel_v2
      label: cutadaptparallel_v2
      $namespaces:
        sbg: 'https://www.sevenbridges.com/'
      inputs:
        - id: fastq1
          type: 'File[]?'
          'sbg:x': -323.37908935546875
          'sbg:y': -315.423583984375
        - id: fastq2
          type: 'File[]?'
          'sbg:x': -323.4378662109375
          'sbg:y': -445.9402160644531
        - id: sample
          type: 'string[]?'
          'sbg:x': -324.03839111328125
          'sbg:y': -556.1343383789062
      outputs:
        - id: outfastq1
          outputSource:
            - cutadapt_v2/outfastq1
          type: 'File[]?'
          'sbg:x': 90.7265625
          'sbg:y': -422
        - id: outfastq2
          outputSource:
            - cutadapt_v2/outfastq2
          type: 'File[]?'
          'sbg:x': 36
          'sbg:y': -564
      steps:
        - id: cutadapt_v2
          in:
            - id: fastq1
              source: fastq1
            - id: fastq2
              source: fastq2
            - id: sample
              source: sample
          out:
            - id: outfastq1
            - id: outfastq2
          run:
            class: CommandLineTool
            cwlVersion: v1.0
            $namespaces:
              sbg: 'https://www.sevenbridges.com/'
            id: cutadapt_v2
            baseCommand:
              - cutadapt
            inputs:
              - id: fastq1
                type: File
                inputBinding:
                  position: 4
              - default: AGATCGGAAGAGC
                id: adapter
                type: string?
                inputBinding:
                  position: 1
                  prefix: '-b'
              - default: $(inputs.sample).cutadapt.fq1.gz
                id: output
                type: string?
                inputBinding:
                  position: 2
                  prefix: '-o'
                  valueFrom: $(inputs.sample).cutadapt.fq1.gz
              - default: $(inputs.sample).cutadapt.fq2.gz
                id: paired_output
                type: string?
                inputBinding:
                  position: 3
                  prefix: '-p'
                  valueFrom: $(inputs.sample).cutadapt.fq2.gz
              - id: fastq2
                type: File?
                inputBinding:
                  position: 5
              - id: sample
                type: string?
            outputs:
              - id: outfastq1
                type: File?
                outputBinding:
                  glob: |
                    $(inputs.sample).cutadapt.fq1.gz
              - id: outfastq2
                type: File?
                outputBinding:
                  glob: |
                    $(inputs.sample).cutadapt.fq2.gz
            doc: >-
              cutadapt  -b AGATCGGAAGAGC  -o 00.rawdata/sample1.raw_1.fq -p
              00.rawdata/sample1.raw_2.fq 
              /gluster/home/micro/Pipeline/16S_pipeline_V3.0/example/00.data/sample1.raw_1.fq.gz 
              /gluster/home/micro/Pipeline/16S_pipeline_V3.0/example/00.data/sample1.raw_2.fq.gz
              >00.rawdata/cutadapt.log
            label: cutadapt-v2
            hints:
              - class: DockerRequirement
                dockerPull: 'cutadapt:1.16'
            requirements:
              - class: InlineJavascriptRequirement
          label: cutadapt-v2
          scatter:
            - fastq1
            - fastq2
            - sample
          scatterMethod: dotproduct
          'sbg:x': -88.69266510009766
          'sbg:y': -470.6273193359375
      requirements:
        - class: ScatterFeatureRequirement
    label: cutadaptparallel_v2
    'sbg:x': 540.9274291992188
    'sbg:y': 677.3812255859375
requirements:
  - class: SubworkflowFeatureRequirement

If the image is created using "cwltool.Dockerfile", the following error message is displayed:

INFO [job cutadapt_v2] Max memory used: 6MiB
INFO [job cutadapt_v2] completed success
INFO [step cutadapt_v2] completed success
INFO [workflow cutadaptparallel_v2] completed success
INFO [step cutadaptparallel_v2] completed success
ERROR Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/cwltool/workflow_job.py", line 777, in try_make_job
    yield from jobs
  File "/usr/local/lib/python3.10/site-packages/cwltool/workflow_job.py", line 76, in job
    yield from self.step.job(joborder, output_callback, runtimeContext)
RuntimeError: close method on generator classes unimplemented

However, if use cwLTool installed directly by Conda, it will work without error

The file used for the test: test.zip

kinow commented 2 years ago

Hi @kongxiangya

I have cwltool installed via pip, via pip install cwltool=3.1.20220406080846. Same version you used with Conda, I think.

If the image is created using "cwltool.Dockerfile", the following error message is displayed:

Could you share what steps you are exactly using to run the workflow? Both of the commands below work for me with your attached files.

(venv) kinow@ranma:/tmp/cwl$ cwltool test.cwl
...
...
INFO [job cutadapt_v2] Max memory used: 0MiB
INFO [job cutadapt_v2] completed success
INFO [step cutadapt_v2] completed success
INFO [workflow cutadaptparallel_v2] completed success
INFO [step cutadaptparallel_v2] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success
(venv) kinow@ranma:/tmp/cwl$ cwltool --parallel test.cwl
...
...

INFO [job cutadapt_v2] Max memory used: 0MiB
INFO [job cutadapt_v2] completed success
INFO [step cutadapt_v2] completed success
INFO [workflow cutadaptparallel_v2] completed success
INFO [step cutadaptparallel_v2] completed success
INFO [workflow ] completed success
{}
INFO Final process status is success
kongxiangya commented 2 years ago

Hi @kinow
Thanks for your answer
My run command is the same : cwltool --parallel test.cwl

I also tried using PIP to install CWLTOOL pip install cwltool=3.1.20220406080846 and it worked fine

Now my problem should be that if I build the image using cwltool.Dockerfile, the error will occur

my Dockerfile is:

FROM python:3.8-alpine as builder
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
RUN apk add --no-cache git gcc python3-dev libxml2-dev libxslt-dev libc-dev linux-headers

WORKDIR /cwltool
COPY . .

RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple toml -rmypy-requirements.txt "$(grep ruamel requirements.txt)" \
    "$(grep schema.salad requirements.txt)"
# schema-salad is needed to be installed (this time as pure Python) for
# cwltool + mypyc
RUN CWLTOOL_USE_MYPYC=1 MYPYPATH=typeshed pip wheel -i https://pypi.tuna.tsinghua.edu.cn/simple --no-binary schema-salad --wheel-dir=/wheels .[deps]
RUN rm /wheels/schema_salad*
RUN pip install black
RUN SCHEMA_SALAD_USE_MYPYC=1 MYPYPATH=typeshed pip wheel -i https://pypi.tuna.tsinghua.edu.cn/simple --no-binary schema-salad \
    $(grep schema.salad requirements.txt) black --wheel-dir=/wheels
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --force-reinstall --no-index --no-warn-script-location --root=/pythonroot/ /wheels/*.whl
# --force-reinstall to install our new mypyc compiled schema-salad package

FROM python:3.8-alpine as module
LABEL maintainer peter.amstutz@curri.com
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories

RUN apk add --no-cache docker nodejs graphviz libxml2 libxslt
COPY --from=builder /pythonroot/ /

FROM python:3.8-alpine
LABEL maintainer peter.amstutz@curri.com
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories

RUN apk add --no-cache docker nodejs graphviz libxml2 libxslt
COPY --from=builder /pythonroot/ /
COPY cwltool-in-docker.sh /cwltool-in-docker.sh

WORKDIR /error

ENTRYPOINT ["/cwltool-in-docker.sh"]

Relevant files are downloaded from https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20220406080846

My image by Dockerfile: cwltool:3.1.20220406080846

mr-c commented 2 years ago

This might be a mypyc failure. Can you try again with CWLTOOL_USE_MYPYC=0?

https://github.com/mypyc/mypyc/issues/900

kongxiangya commented 2 years ago

Hi @mr-c
Thanks for your help. CWLTOOL_USE_MYPYC=0 solved my problem

mr-c commented 2 years ago

Glad to hear it, @kongxiangya ; Did you build from the latest development code of cwltool (which uses mypy 0.950), or the latest cwltool pypi release (which uses mypy 0.942)?

I'm curious if the latest version of mypy still have this issue. If so, I will stop publishing binary wheels of cwltool so that others don't have this problem.

mr-c commented 2 years ago

I was able to reproduce this problem using mypy 0.942 and with mypy 0.950 it is fixed; I'll make a new cwltool release so that others aren't affected. Thanks!

kongxiangya commented 2 years ago

I have tested version 3.1.20220502060230 and SET CWLTOOL_USE_MYPYC=1 without any error. Thanks.