common-workflow-language / schema_salad

Semantic Annotations for Linked Avro Data
https://www.commonwl.org/v1.2/SchemaSalad.html
Apache License 2.0
73 stars 62 forks source link

Roundtrip (load/save) failures #187

Open ghost opened 6 years ago

ghost commented 6 years ago

Attempting to roundtrip the following workflow:

topmed-alignment.cwl.txt

Fails validation

Roundtrip code

import cwlpy.cwlmodels as cwlmodels
import ruamel.yaml as yaml
import sys

W1 = cwlmodels.load_document('topmed-alignment.cwl')
yaml.dump(cwlmodels.save(W1), sys.stdout, Dumper=yaml.RoundTripDumper)
$ python roundtrip.py > test.cwl
$ cwltool --validate test.cwl 
/Users/kghose/miniconda2/envs/kaggle/bin/cwltool 1.0.20180709065250
Resolved 'test.cwl' to 'file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/test.cwl'
test.cwl:205:3:   checking item
test.cwl:205:3:     Field `class` contains undefined reference to `https://sevenbridges.comAWSInstanceType`
Tool definition failed validation:
test.cwl:191:1: checking field `outputs`
test.cwl:192:3:   checking object `topmed-alignment.cwl#output`
test.cwl:197:3:     Field `outputSource` references unknown identifier `file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/topmed-alignment.cwl#output/topmed_post_align/output`, tried file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/topmed-alignment.cwl#output/file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/topmed-alignment.cwl#output/topmed_post_align/output, file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/topmed-alignment.cwl#file:///Users/kghose/Code/cwlpy/examples/003.sbg.topmed/topmed-alignment.cwl#output/topmed_post_align/output

Generated CWL is: test.cwl.txt

schema-salad-tool --codegen python CommonWorkflowLanguage.yml > ../cwlpy/cwlmodels.py
Current version: 2.7.20180701171628
cwltool --version
1.0.20180709065250
ruamel.yaml              0.13.13 

@tetron

ghost commented 6 years ago

@tetron of note is not the https://sevenbridges.comAWSInstanceType issue but the general messiness with ID URIs.

For cwlpy I'm facing workflow serialization issues that I first thought were due to me not understanding URIs. However now I think that, in addition to that :) , there are some issues with IDs and codegen itself.

From my ignorant perspective the IDs should always remain unchanging regardless of base_uri. The base_uri should only affect things like the run field.

ghost commented 6 years ago

Adding a simple(r) example as follows:

simple_workflow.cwl

class: Workflow
cwlVersion: v1.0
id: simple_workflow
label: simple_workflow
inputs:
  - id: input
    type: File?
outputs:
  - id: output
    outputSource:
      - simple_commandline_1/output
    type: File?
steps:
  - id: simple_commandline
    in:
      - id: input
        source: input
    out:
      - id: output
    run: ./simple_commandline.cwl
    label: simple_commandline
  - id: simple_commandline_1
    in:
      - id: input
        source: simple_commandline/output
    out:
      - id: output
    run: ./simple_commandline.cwl
    label: simple_commandline
requirements: []

simple_commandline.cwl:

class: CommandLineTool
cwlVersion: v1.0
id: simple_commandline
baseCommand:
  - echo
inputs:
  - id: input
    type: File?
    inputBinding:
      position: 0
outputs:
  - id: output
    type: File?
    outputBinding:
      glob: '*.txt'
label: simple_commandline
requirements:
  - class: DockerRequirement
    dockerPull: alpine
stdout: output.txt

The workflow validates

cwltool --validate simple_workflow.cwl
/Users/kghose/miniconda3/envs/cwl/bin/cwltool 1.0.20181012180214
-> simple_workflow.cwl is valid CWL.

I use code gen to recursively load this document and when I export it via doc.save() I get the following dict

{'class': 'Workflow',
 'id': 'file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow',
 'inputs': [{'id': 'input', 'type': ['null', 'File']}],
 'outputs': [{'id': 'output',
   'outputSource': ['simple_commandline_1/output'],
   'type': ['null', 'File']}],
 'requirements': [],
 'label': 'simple_workflow',
 'cwlVersion': 'v1.0',
 'steps': [{'id': 'simple_commandline',
   'in': [{'id': 'input', 'source': 'simple_workflow/input'}],
   'out': [{'id': 'output'}],
   'label': 'simple_commandline',
   'run': {'class': 'CommandLineTool',
    'id': 'simple_commandline.cwl#simple_commandline',
    'inputs': [{'id': 'input',
      'inputBinding': {'position': 0},
      'type': ['null', 'File']}],
    'outputs': [{'id': 'output',
      'outputBinding': {'glob': '*.txt'},
      'type': ['null', 'File']}],
    'requirements': [{'class': 'DockerRequirement', 'dockerPull': 'alpine'}],
    'label': 'simple_commandline',
    'cwlVersion': 'v1.0',
    'baseCommand': ['echo'],
    'stdout': 'output.txt'}},
  {'id': 'simple_commandline_1',
   'in': [{'id': 'input',
     'source': 'simple_workflow/simple_commandline/output'}],
   'out': [{'id': 'output'}],
   'label': 'simple_commandline',
   'run': {'class': 'CommandLineTool',
    'id': 'simple_commandline.cwl#simple_commandline',
    'inputs': [{'id': 'input',
      'inputBinding': {'position': 0},
      'type': ['null', 'File']}],
    'outputs': [{'id': 'output',
      'outputBinding': {'glob': '*.txt'},
      'type': ['null', 'File']}],
    'requirements': [{'class': 'DockerRequirement', 'dockerPull': 'alpine'}],
    'label': 'simple_commandline',
    'cwlVersion': 'v1.0',
    'baseCommand': ['echo'],
    'stdout': 'output.txt'}}]}

which saves to

from ruamel.yaml import YAML
yaml = YAML()
yaml.dump(doc.save(), open('roundtrip.cwl', 'w'))
class: Workflow
id: file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow
inputs:
- id: input
  type:
  - 'null'
  - File
outputs:
- id: output
  outputSource:
  - simple_commandline_1/output
  type:
  - 'null'
  - File
requirements: []
label: simple_workflow
cwlVersion: v1.0
steps:
- id: simple_commandline
  in:
  - id: input
    source: simple_workflow/input
  out:
  - id: output
  label: simple_commandline
  run:
    class: CommandLineTool
    id: simple_commandline.cwl#simple_commandline
    inputs:
    - id: input
      inputBinding:
        position: 0
      type:
      - 'null'
      - File
    outputs:
    - id: output
      outputBinding:
        glob: '*.txt'
      type:
      - 'null'
      - File
    requirements:
    - class: DockerRequirement
      dockerPull: alpine
    label: simple_commandline
    cwlVersion: v1.0
    baseCommand:
    - echo
    stdout: output.txt
- id: simple_commandline_1
  in:
  - id: input
    source: simple_workflow/simple_commandline/output
  out:
  - id: output
  label: simple_commandline
  run:
    class: CommandLineTool
    id: simple_commandline.cwl#simple_commandline
    inputs:
    - id: input
      inputBinding:
        position: 0
      type:
      - 'null'
      - File
    outputs:
    - id: output
      outputBinding:
        glob: '*.txt'
      type:
      - 'null'
      - File
    requirements:
    - class: DockerRequirement
      dockerPull: alpine
    label: simple_commandline
    cwlVersion: v1.0
    baseCommand:
    - echo
    stdout: output.txt

which fails validation:

Tool definition failed validation:
roundtrip.cwl:18:1: checking field `steps`
roundtrip.cwl:19:3:   checking object `roundtrip.cwl#simple_commandline`
roundtrip.cwl:20:3:     checking field `in`
roundtrip.cwl:21:5:       checking object `roundtrip.cwl#simple_commandline/input`
roundtrip.cwl:22:5:         Field `source` references unknown identifier `simple_workflow/input`, tried file:///Users/kghose/Downloads/roundtrip.cwl#simple_workflow/input
roundtrip.cwl:51:3:   checking object `roundtrip.cwl#simple_commandline_1`
roundtrip.cwl:52:3:     checking field `in`
roundtrip.cwl:53:5:       checking object `roundtrip.cwl#simple_commandline_1/input`
roundtrip.cwl:54:5:         Field `source` references unknown identifier `simple_workflow/simple_commandline/output`, tried file:///Users/kghose/Downloads/roundtrip.cwl#simple_workflow/simple_commandline/output

@tetron happy to try and fix this if you can give a few pointers. It seems this is very close, since if I edit the two source fields, removing simple_workflow from them the workflow becomes correctly connected.

22: simple_workflow/input -> input
54: simple_workflow/simple_commandline/output -> simple_commandline/output

Though some grumbling remains

Resolved 'roundtrip.cwl' to 'file:///Users/kghose/Downloads/roundtrip.cwl'
roundtrip.cwl:28:5: object id `simple_commandline.cwl#simple_commandline` previously defined
roundtrip.cwl:30:7: object id `simple_commandline.cwl#simple_commandline/input` previously defined
roundtrip.cwl:37:7: object id `simple_commandline.cwl#simple_commandline/output` previously defined
roundtrip.cwl:28:5: object id `simple_commandline.cwl#simple_commandline` previously defined
roundtrip.cwl:30:7: object id `simple_commandline.cwl#simple_commandline/input` previously defined
roundtrip.cwl:37:7: object id `simple_commandline.cwl#simple_commandline/output` previously defined
roundtrip.cwl is valid CWL.
mr-c commented 6 years ago

@kaushik-work if the top level id is file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow then either the other internal references to the top level workflow inputs or steps have to start with file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow or just #simple_workflow but not like simple_workflow/simple_commandline/output