common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 231 forks source link

TypeError: unhashable type: 'CommentedMap' #766

Open SooLee opened 6 years ago

SooLee commented 6 years ago

I converted a working draft-3 CWL file to a version 1.0 CWL file by changing the following.

  1. "cwlVersion": "draft-3", -> "cwlVersion": "v1.0",
  2. "inputs" -> "in" for all STEP inputs (but not workflow inputs)
  3. "outputs" -> "out" for all STEP outputs (but not workflow outputs)
  4. "source" -> "outputSource" for all workflow outputs (but not step inputs)

(My file doesn't have a field 'description')

I'm getting the following error when running the latest master of cwltool.

Traceback (most recent call last):
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/main.py", line 492, in main
    makeTool, make_tool_kwds)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 341, in make_tool
    tool = makeTool(processobj, **kwargs)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/workflow.py", line 44, in defaultMakeTool
    return Workflow(toolpath_object, **kwargs)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/workflow.py", line 432, in __init__
    self.steps.append(WorkflowStep(step, n, **kwargs))
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/workflow.py", line 550, in __init__
    + "\n" + SourceLine(self.embedded_tool.tool, "outputs").makeError(
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/process.py", line 171, in shortname
    d = urllib.parse.urlparse(inputid)
  File "/usr/lib64/python2.7/urlparse.py", line 143, in urlparse
    tuple = urlsplit(url, scheme, allow_fragments)
  File "/usr/lib64/python2.7/urlparse.py", line 176, in urlsplit
    cached = _parse_cache.get(key, None)
TypeError: unhashable type: 'CommentedMap'

I'm not sure if I can get what exactly is wrong with this CWL file. Some help would be appreciated. Thanks.

mr-c commented 6 years ago

Hello @SooLee . We should generate a better error in this instance. Can you share your file?

You may also find a solution via cwl-upgrader @ https://github.com/common-workflow-language/cwl-upgrader which upgrades draft-3 to v1.0.

SooLee commented 6 years ago

Yes, I know. That tool doesn't work. It does not produce a proper json or yaml format file. The output of cwl-upgrader is not a cwl.

This resolved the error.

  1. The format of an id: #prefix.name -> #prefix/name for all STEP inputs and outputs and the source of all WORKFLOW outputs and all STEP inputs
SooLee commented 6 years ago

It would be very useful to document all the differences between draft-3 and version 1.0, since cwltool doesn't support draft-3 any more. We're forced to convert and there is really no easy way. :(

mr-c commented 6 years ago

@SooLee I am very interested in cwl-upgrader having correct functionality. If you could provide me with the draft-3 workflow that was causing problems then I will fix cwl-upgrader. If needed, you can email me directly: mrc@commonwl.org

mr-c commented 6 years ago

We want upgrading from draft-3 to be easy :-)

SooLee commented 6 years ago

I think the main problem is that it is not JSON. That is probably an easy fix. Below is an example output. It should first have 1) double quotes around field names and values and 2) outermost braces removed. It should also include change #5 that I mentioned above (i.e. '.' -> '/' in the id and source strings).

{class: Workflow, inputs: [{id: '#input_file', type: ['null', File]}], steps: [{id: '#md5',
      out: [{id: '#md5.report'}], run: md5.cwl, in: [{source: '#input_file', id: '#md5.input_file'}]},
    {id: '#validatefiles', out: [{id: '#validatefiles.report'}], run: validate.cwl,
      in: [{source: '#input_file', id: '#validatefiles.input_file'}, {id: '#validatefiles.type'}]}],
  outputs: [{type: ['null', File], id: '#validatefiles_report', outputSource: '#validatefiles.report'},
    {type: ['null', File], id: '#md5_report', outputSource: '#md5.report'}], cwlVersion: v1.0}
mr-c commented 6 years ago

The output of cwl-upgrader is in YAML, the syntax of CWL since before draft-3 🙂

There may be a bug with the processing, but it will be much easier for me to fix it if I can examine the initial CWL draft-3 document

SooLee commented 6 years ago

I see. I didn't know there was a one-liner version for YAML. The original cwl looks as below.

{
  "class": "Workflow",
  "inputs": [
    {
      "id": "#input_file",
      "type": [
        "null",
        "File"
      ]
    }
  ],
  "steps": [
    {
      "id": "#md5",
      "outputs": [
        {
          "id": "#md5.report"
        }
      ],
      "run": "md5.cwl",
      "inputs": [
        {
          "source": "#input_file",
          "id": "#md5.input_file"
        }
      ]
    },
    {
      "id": "#validatefiles",
      "outputs": [
        {
          "id": "#validatefiles.report"
        }
      ],
      "run": "validate.cwl",
      "inputs": [
        {
          "source": "#input_file",
          "id": "#validatefiles.input_file"
        },
        {
          "id": "#validatefiles.type"
        }
      ]
    }
  ],
  "outputs": [
    {
      "type": [
        "null",
        "File"
      ],
      "id": "#validatefiles_report",
      "source": "#validatefiles.report"
    },
    {
      "type": [
        "null",
        "File"
      ],
      "id": "#md5_report",
      "source": "#md5.report"
    }
  ],
  "cwlVersion": "draft-3",
  "requirements": [
    {
      "class": "InlineJavascriptRequirement"
    }
  ]
}
mr-c commented 6 years ago

Ah, the strange output was because it wasn't recognized as a valid draft-3 document for some reason.

Skipping non draft-3 CWL document

It may be wrong. Was this written by hand, or is it from Seven Bridges or another source?

mr-c commented 6 years ago

I see two deviations from the draft-3 syntax that is tripping up cwl-upgrader:

  1. "cwlVersion": "draft-3" is missing the cwl prefix from the version number; should be "cwlVersion": "cwl:draft-3"
  2. An extra input with no source in the validatefiles step: validatefiles.type
      "run": "validate.cwl",
      "inputs": [
        {
          "source": "#input_file",
          "id": "#validatefiles.input_file"
        },
        {
          "id": "#validatefiles.type"
        }

    Using a new branch of cwl-upgrader I now get the following output

    class: Workflow
    cwlVersion: v1.0
    inputs:
    - {id: input_file, type: 'File?'}
    outputs:
    - {id: validatefiles_report, outputSource: validatefiles.report, type: 'File?'}
    - {id: md5_report, outputSource: md5.report, type: 'File?'}
    requirements:
    - {class: InlineJavascriptRequirement}
    steps:
    md5:
    in:
      input_file: {source: input_file}
    out: [report]
    validatefiles:
    in:
      input_file: {source: input_file}
    out: [report]

    Thanks for providing your workflow source. The cwl-upgrader was mainly focused on draft-3 CommandLineTool conversion, so I'm glad to have the opportunity to improve its Workflow conversion.

SooLee commented 6 years ago

It is a valid draft-3 cwl and we have been running it with an older version of cwltool with no problem. I noticed cwl-upgrader sometimes converts it to v1.0 as I posted above and sometimes it doesn't convert and gives you that 'skipping non draft-3 cwl document' message, on the same file. Not sure what I did differently.

I wrote a converter myself in the mean time that implements changes in above #1~#5 + 'description'->'doc') but probably won't work comprehensively on all draft-3 cwl files. Works for our cwl files.

SooLee commented 6 years ago

I just saw your last message. Let me test it.

mr-c commented 6 years ago

Yes, I'm loosening the requirement about cwl:draft-3 in https://github.com/common-workflow-language/cwl-upgrader/pull/16 (when I ran it earlier I manually added the cwl prefix in)

SooLee commented 6 years ago

The conversion was successful (looks much nicer too :)) but when I ran it using the latest version of cwltool I get the following error.

Traceback (most recent call last):
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/main.py", line 469, in main
    do_validate=args.do_validate)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 259, in validate_document
    "\n{}".format("\n".join(versions)))
ValidationException: The CWL reference runner no longer supports pre CWL v1.0 documents. Supported versions are: 
v1.0
v1.1.0-dev1 (with --enable-dev flag only)
SooLee commented 6 years ago

Actually, it didn't really convert. The output is still draft-3

class: Workflow
cwlVersion: draft-3
inputs:
- id: '#input_file'
  type: ['null', File]
outputs:
- id: '#validatefiles_report'
  source: '#validatefiles.report'
  type: ['null', File]
- id: '#md5_report'
  source: '#md5.report'
  type: ['null', File]
requirements:
- {class: InlineJavascriptRequirement}
steps:
- id: '#md5'
  inputs:
  - {id: '#md5.input_file', source: '#input_file'}
  outputs:
  - {id: '#md5.report'}
  run: md5.cwl
- id: '#validatefiles'
  inputs:
  - {id: '#validatefiles.input_file', source: '#input_file'}
  - {id: '#validatefiles.type'}
  outputs:
  - {id: '#validatefiles.report'}
  run: validate.cwl
mr-c commented 6 years ago

Okay, I see the error

mr-c commented 6 years ago

Can you test https://github.com/common-workflow-language/cwl-upgrader/pull/19 ?

SooLee commented 6 years ago
Traceback (most recent call last):
  File "/home/ec2-user/venv/cwl/bin/cwl-upgrader", line 11, in <module>
    sys.exit(main())
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwlupgrader/main.py", line 17, in main
    draft3_to_v1_0(document)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwlupgrader/main.py", line 24, in draft3_to_v1_0
    _draft3_to_v1_0(document)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwlupgrader/main.py", line 45, in _draft3_to_v1_0
    inp["source"] = inp["source"].lstrip('#')
KeyError: 'source'

I will test more later.

mr-c commented 6 years ago

That error is due to the extra input with no source in the validatefiles step: validatefiles.type

      "run": "validate.cwl",
      "inputs": [
        {
          "source": "#input_file",
          "id": "#validatefiles.input_file"
        },
        {
          "id": "#validatefiles.type"
        }

You'll need to remove

,
        {
          "id": "#validatefiles.type"
        }

by hand to get it to work.

SooLee commented 6 years ago

I thought you loosened it, just before this change (what happened?). It works for most other cwl files we have, except the ones with items without source.

SooLee commented 6 years ago

This is with a different workflow, containing scatter. The converted cwl had a null scatter. old:

      "scatter": "#pairsam-parse-sort.bam"

new:

    scatter: ''

As a result, when I ran the cwl, I get the following error.

Traceback (most recent call last):
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/main.py", line 469, in main
    do_validate=args.do_validate)
  File "/home/ec2-user/venv/cwl/local/lib/python2.7/site-packages/cwltool/load_tool.py", line 276, in validate_document
    workflowobj, fileuri, checklinks=do_validate)
  File "/home/ec2-user/venv/cwl/src/schema-salad/schema_salad/ref_resolver.py", line 900, in resolve_all
    self.validate_links(document, u"", all_doc_ids)
  File "/home/ec2-user/venv/cwl/src/schema-salad/schema_salad/ref_resolver.py", line 1060, in validate_links
    raise errors[0]
SooLee commented 6 years ago

Actually, about the above case, it may be due to having '.' instead of '/' (though I think scatter also needs to be fixed)

    in:
      input_pairsams: {fdn_format: pairsam, source: pairsam-parse-sort.sorted_pairsam}
ValidationException: hi-c-processing-bam.cwl:41:1: checking field `steps`
hi-c-processing-bam.cwl:42:3:   checking object `hi-c-processing-bam.cwl#pairsam-filter`
hi-c-processing-bam.cwl:47:5:     checking field `in`
hi-c-processing-bam.cwl:49:7:       checking object `hi-c-processing-bam.cwl#pairsam-filter/input_pairsam`
hi-c-processing-bam.cwl:49:44:         Field `source` references unknown identifier `pairsam-markasdup.dupmarked_pairsam`, tried file:///home/ec2-user/pipelines-cwl/cwl_awsem_v1/hi-c-processing-bam.cwl#pairsam-markasdup.dupmarked_pairsam
hi-c-processing-bam.cwl:52:3:   checking object `hi-c-processing-bam.cwl#pairsam-markasdup`
hi-c-processing-bam.cwl:57:5:     checking field `in`
hi-c-processing-bam.cwl:58:7:       checking object `hi-c-processing-bam.cwl#pairsam-markasdup/input_pairsam`
hi-c-processing-bam.cwl:58:44:         Field `source` references unknown identifier `pairsam-merge.merged_pairsam`, tried file:///home/ec2-user/pipelines-cwl/cwl_awsem_v1/hi-c-processing-bam.cwl#pairsam-merge.merged_pairsam
hi-c-processing-bam.cwl:61:3:   checking object `hi-c-processing-bam.cwl#pairsam-merge`
hi-c-processing-bam.cwl:66:5:     checking field `in`
hi-c-processing-bam.cwl:67:7:       checking object `hi-c-processing-bam.cwl#pairsam-merge/input_pairsams`
hi-c-processing-bam.cwl:67:45:         Field `source` references unknown identifier `pairsam-parse-sort.sorted_pairsam`, tried file:///home/ec2-user/pipelines-cwl/cwl_awsem_v1/hi-c-processing-bam.cwl#pairsam-parse-sort.sorted_pairsam
mr-c commented 3 years ago

Sorry for losing track of this thread @SooLee

This is with a different workflow, containing scatter. The converted cwl had a null scatter.

Do you happen to still have the original file around?