Open ghost opened 6 years ago
@tetron of note is not the https://sevenbridges.comAWSInstanceType
issue but the general messiness with ID URIs.
For cwlpy
I'm facing workflow serialization issues that I first thought were due to me not understanding URIs. However now I think that, in addition to that :) , there are some issues with IDs and codegen itself.
From my ignorant perspective the IDs should always remain unchanging regardless of base_uri
. The base_uri
should only affect things like the run
field.
Adding a simple(r) example as follows:
simple_workflow.cwl
class: Workflow
cwlVersion: v1.0
id: simple_workflow
label: simple_workflow
inputs:
- id: input
type: File?
outputs:
- id: output
outputSource:
- simple_commandline_1/output
type: File?
steps:
- id: simple_commandline
in:
- id: input
source: input
out:
- id: output
run: ./simple_commandline.cwl
label: simple_commandline
- id: simple_commandline_1
in:
- id: input
source: simple_commandline/output
out:
- id: output
run: ./simple_commandline.cwl
label: simple_commandline
requirements: []
simple_commandline.cwl
:
class: CommandLineTool
cwlVersion: v1.0
id: simple_commandline
baseCommand:
- echo
inputs:
- id: input
type: File?
inputBinding:
position: 0
outputs:
- id: output
type: File?
outputBinding:
glob: '*.txt'
label: simple_commandline
requirements:
- class: DockerRequirement
dockerPull: alpine
stdout: output.txt
The workflow validates
cwltool --validate simple_workflow.cwl
/Users/kghose/miniconda3/envs/cwl/bin/cwltool 1.0.20181012180214
-> simple_workflow.cwl is valid CWL.
I use code gen to recursively load this document and when I export it via doc.save()
I get the following dict
{'class': 'Workflow',
'id': 'file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow',
'inputs': [{'id': 'input', 'type': ['null', 'File']}],
'outputs': [{'id': 'output',
'outputSource': ['simple_commandline_1/output'],
'type': ['null', 'File']}],
'requirements': [],
'label': 'simple_workflow',
'cwlVersion': 'v1.0',
'steps': [{'id': 'simple_commandline',
'in': [{'id': 'input', 'source': 'simple_workflow/input'}],
'out': [{'id': 'output'}],
'label': 'simple_commandline',
'run': {'class': 'CommandLineTool',
'id': 'simple_commandline.cwl#simple_commandline',
'inputs': [{'id': 'input',
'inputBinding': {'position': 0},
'type': ['null', 'File']}],
'outputs': [{'id': 'output',
'outputBinding': {'glob': '*.txt'},
'type': ['null', 'File']}],
'requirements': [{'class': 'DockerRequirement', 'dockerPull': 'alpine'}],
'label': 'simple_commandline',
'cwlVersion': 'v1.0',
'baseCommand': ['echo'],
'stdout': 'output.txt'}},
{'id': 'simple_commandline_1',
'in': [{'id': 'input',
'source': 'simple_workflow/simple_commandline/output'}],
'out': [{'id': 'output'}],
'label': 'simple_commandline',
'run': {'class': 'CommandLineTool',
'id': 'simple_commandline.cwl#simple_commandline',
'inputs': [{'id': 'input',
'inputBinding': {'position': 0},
'type': ['null', 'File']}],
'outputs': [{'id': 'output',
'outputBinding': {'glob': '*.txt'},
'type': ['null', 'File']}],
'requirements': [{'class': 'DockerRequirement', 'dockerPull': 'alpine'}],
'label': 'simple_commandline',
'cwlVersion': 'v1.0',
'baseCommand': ['echo'],
'stdout': 'output.txt'}}]}
which saves to
from ruamel.yaml import YAML
yaml = YAML()
yaml.dump(doc.save(), open('roundtrip.cwl', 'w'))
class: Workflow
id: file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow
inputs:
- id: input
type:
- 'null'
- File
outputs:
- id: output
outputSource:
- simple_commandline_1/output
type:
- 'null'
- File
requirements: []
label: simple_workflow
cwlVersion: v1.0
steps:
- id: simple_commandline
in:
- id: input
source: simple_workflow/input
out:
- id: output
label: simple_commandline
run:
class: CommandLineTool
id: simple_commandline.cwl#simple_commandline
inputs:
- id: input
inputBinding:
position: 0
type:
- 'null'
- File
outputs:
- id: output
outputBinding:
glob: '*.txt'
type:
- 'null'
- File
requirements:
- class: DockerRequirement
dockerPull: alpine
label: simple_commandline
cwlVersion: v1.0
baseCommand:
- echo
stdout: output.txt
- id: simple_commandline_1
in:
- id: input
source: simple_workflow/simple_commandline/output
out:
- id: output
label: simple_commandline
run:
class: CommandLineTool
id: simple_commandline.cwl#simple_commandline
inputs:
- id: input
inputBinding:
position: 0
type:
- 'null'
- File
outputs:
- id: output
outputBinding:
glob: '*.txt'
type:
- 'null'
- File
requirements:
- class: DockerRequirement
dockerPull: alpine
label: simple_commandline
cwlVersion: v1.0
baseCommand:
- echo
stdout: output.txt
which fails validation:
Tool definition failed validation:
roundtrip.cwl:18:1: checking field `steps`
roundtrip.cwl:19:3: checking object `roundtrip.cwl#simple_commandline`
roundtrip.cwl:20:3: checking field `in`
roundtrip.cwl:21:5: checking object `roundtrip.cwl#simple_commandline/input`
roundtrip.cwl:22:5: Field `source` references unknown identifier `simple_workflow/input`, tried file:///Users/kghose/Downloads/roundtrip.cwl#simple_workflow/input
roundtrip.cwl:51:3: checking object `roundtrip.cwl#simple_commandline_1`
roundtrip.cwl:52:3: checking field `in`
roundtrip.cwl:53:5: checking object `roundtrip.cwl#simple_commandline_1/input`
roundtrip.cwl:54:5: Field `source` references unknown identifier `simple_workflow/simple_commandline/output`, tried file:///Users/kghose/Downloads/roundtrip.cwl#simple_workflow/simple_commandline/output
@tetron happy to try and fix this if you can give a few pointers. It seems this is very close, since if I edit the two source fields, removing simple_workflow
from them the workflow becomes correctly connected.
22: simple_workflow/input -> input
54: simple_workflow/simple_commandline/output -> simple_commandline/output
Though some grumbling remains
Resolved 'roundtrip.cwl' to 'file:///Users/kghose/Downloads/roundtrip.cwl'
roundtrip.cwl:28:5: object id `simple_commandline.cwl#simple_commandline` previously defined
roundtrip.cwl:30:7: object id `simple_commandline.cwl#simple_commandline/input` previously defined
roundtrip.cwl:37:7: object id `simple_commandline.cwl#simple_commandline/output` previously defined
roundtrip.cwl:28:5: object id `simple_commandline.cwl#simple_commandline` previously defined
roundtrip.cwl:30:7: object id `simple_commandline.cwl#simple_commandline/input` previously defined
roundtrip.cwl:37:7: object id `simple_commandline.cwl#simple_commandline/output` previously defined
roundtrip.cwl is valid CWL.
@kaushik-work if the top level id
is file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow
then either the other internal references to the top level workflow inputs or steps have to start with file:///Users/kghose/Downloads/simple_workflow.cwl#simple_workflow
or just #simple_workflow
but not like simple_workflow/simple_commandline/output
Attempting to roundtrip the following workflow:
topmed-alignment.cwl.txt
Fails validation
Roundtrip code
Generated CWL is: test.cwl.txt
@tetron