common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
335 stars 230 forks source link

Formt checking broken #723

Open wilke opened 6 years ago

wilke commented 6 years ago

Related to #716

Expected Behavior

a) Value from format field should not be modified and b) "formats may be tested by exact match" c) format: [gz] in job file and format: [gz] in tool/workflow are identical.

Actual Behavior

a) Values in format field are always transformed into absolute URIs with file name as prefix b) Only a very limited number of schema names are supported:

Getting:

Incompatible file format, expected format(s) [u'file:///amplicon/CWL/Tools/decompress.tool.cwl#file/gz'] but file object is: { "class": "File", "format": "file:///amplicon/CWL/Tools/gz", "location": "file:///amplicon/CWL/Data/Inputs/seq.fastq.gz", "size": 1522311, "basename": "seq.fastq.gz", "nameroot": "seq.fastq", "nameext": ".gz" }

Workflow Code

Job:

file:
  class: File
  path: sequences.fastq.gz
  format: "file://gz"
#format: gz
cwlVersion: v1.0
class: CommandLineTool

label: decompress

requirements:
  InlineJavascriptRequirement: {}    

stdout: |
    ${return inputs.file.basename.replace(/\.gz$/ , '') ;   }

stderr: decompress.error

inputs:
  file:
    type: File
    format:
      - "file://gz"
#      - gz
    inputBinding:
      position: 3  
  output:
    type: string?
    default: ''        

baseCommand: [gzip]

arguments:
  - -c
  - -d  

outputs:

  decompressed:
    type: stdout
  error: 
    type: stderr  
tetron commented 6 years ago

It is intended that format must be a URI. However, unqualified strings get interpreted as relative URIs, which as you point out is totally unhelpful. I think the correct change in behavior is for unqualified paths to raise an error (something like "gz" is not a fully qualified URI).

Either way, you need to use URIs (which don't have to be resolvable and can be made up).

mr-c commented 6 years ago

To clarify: additional schema names are supported but have to be declared. See the numerous examples using edam: