common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
336 stars 231 forks source link

--target fails with workflow schema in packed $graph format #991

Closed roksys closed 5 years ago

roksys commented 5 years ago

Hi,

I was working on integrating cwltool to REANA and faced a problem when passing --target with workflow schema in json format.

workflow.json

{
   "cwlVersion":"v1.0",
   "$graph":[
      {
         "inputs":[
            {
               "type":"File",
               "id":"#fitdata.cwl/data"
            },
            {
               "type":"File",
               "id":"#fitdata.cwl/fitdata"
            },
            {
               "default":"plot.png",
               "type":"string",
               "id":"#fitdata.cwl/outfile"
            }
         ],
         "requirements":[
            {
               "dockerPull":"reanahub/reana-env-root6",
               "class":"DockerRequirement"
            },
            {
               "class":"InitialWorkDirRequirement",
               "listing":[
                  "$(inputs.fitdata)",
                  "$(inputs.data)"
               ]
            }
         ],
         "stdout":"fitdata.log",
         "outputs":[
            {
               "type":"stdout",
               "id":"#fitdata.cwl/fitdata.log"
            },
            {
               "outputBinding":{
                  "glob":"$(inputs.outfile)"
               },
               "type":"File",
               "id":"#fitdata.cwl/result"
            }
         ],
         "baseCommand":"/bin/sh",
         "class":"CommandLineTool",
         "arguments":[
            {
               "prefix":"-c",
               "valueFrom":"root -b -q '$(inputs.fitdata.basename)(\"$(inputs.data.basename)\",\"$(runtime.outdir)/$(inputs.outfile)\")'\n"
            }
         ],
         "id":"#fitdata.cwl"
      },
      {
         "inputs":[
            {
               "type":"int",
               "id":"#gendata.cwl/events"
            },
            {
               "type":"File",
               "id":"#gendata.cwl/gendata_tool"
            },
            {
               "default":"data.root",
               "type":"string",
               "id":"#gendata.cwl/outfilename"
            }
         ],
         "requirements":[
            {
               "dockerPull":"reanahub/reana-env-root6",
               "class":"DockerRequirement"
            },
            {
               "class":"InitialWorkDirRequirement",
               "listing":[
                  "$(inputs.gendata_tool)"
               ]
            }
         ],
         "stdout":"gendata.log",
         "outputs":[
            {
               "outputBinding":{
                  "glob":"$(inputs.outfilename)"
               },
               "type":"File",
               "id":"#gendata.cwl/data"
            },
            {
               "type":"stdout",
               "id":"#gendata.cwl/gendata.log"
            }
         ],
         "baseCommand":"/bin/sh",
         "class":"CommandLineTool",
         "arguments":[
            {
               "prefix":"-c",
               "valueFrom":"root -b -q '$(inputs.gendata_tool.basename)($(inputs.events),\"$(runtime.outdir)/$(inputs.outfilename)\")'\n"
            }
         ],
         "id":"#gendata.cwl"
      },
      {
         "inputs":[
            {
               "type":"int",
               "id":"#main/events"
            },
            {
               "type":"File",
               "id":"#main/fitdata_tool"
            },
            {
               "type":"File",
               "id":"#main/gendata_tool"
            }
         ],
         "steps":[
            {
               "out":[
                  "#main/fitdata/result",
                  "#main/fitdata/fitdata.log"
               ],
               "run":"#fitdata.cwl",
               "id":"#main/fitdata",
               "in":[
                  {
                     "source":"#main/gendata/data",
                     "id":"#main/fitdata/data"
                  },
                  {
                     "source":"#main/fitdata_tool",
                     "id":"#main/fitdata/fitdata"
                  }
               ]
            },
            {
               "out":[
                  "#main/gendata/data",
                  "#main/gendata/gendata.log"
               ],
               "run":"#gendata.cwl",
               "id":"#main/gendata",
               "in":[
                  {
                     "source":"#main/events",
                     "id":"#main/gendata/events"
                  },
                  {
                     "source":"#main/gendata_tool",
                     "id":"#main/gendata/gendata_tool"
                  }
               ]
            }
         ],
         "class":"Workflow",
         "outputs":[
            {
               "type":"File",
               "outputSource":"#main/fitdata/fitdata.log",
               "id":"#main/fitdata.log"
            },
            {
               "type":"File",
               "outputSource":"#main/gendata/gendata.log",
               "id":"#main/gendata.log"
            },
            {
               "type":"File",
               "outputSource":"#main/fitdata/result",
               "id":"#main/plot"
            }
         ],
         "id":"#main"
      }
   ]
}

inputs.json

{
   "fitdata_tool":{
      "path":"code/fitdata.C",
      "class":"File"
   },
   "events":20000,
   "gendata_tool":{
      "path":"code/gendata.C",
      "class":"File"
   }
}

Reproducing error

$ cwltool --debug  --target gendata.log workflow.json inputs.json 
/Users/rokas/.virtualenvs/cwl/bin/cwltool 1.0.20181102182747
Resolved 'workflow.json' to 'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json'
I'm sorry, I couldn't load this CWL file.
The error was: 
Traceback (most recent call last):
  File "/Users/rokas/.virtualenvs/cwl/lib/python2.7/site-packages/cwltool/main.py", line 707, in main
    tool)
  File "/Users/rokas/.virtualenvs/cwl/lib/python2.7/site-packages/cwltool/subgraph.py", line 86, in get_subgraph
    if nodes[r].type == OUTPUT:
KeyError: u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#gendata.log'

It works fine with with workflow in cwl.

$ cwltool --quiet  --target gendata.log ../workflow/cwl/workflow.cwl  input.yml 
{
    "gendata.log": {
        "checksum": "sha1$dee4a5e8520d40d915e958a98be515b7355a6791", 
        "basename": "gendata.log", 
        "location": "file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/gendata.log", 
        "path": "/Users/rokas/reana-demo-root6-roofit/cwl-local-run/gendata.log", 
        "class": "File", 
        "size": 2140
    }
}

Example code - https://github.com/reanahub/reana-demo-root6-roofit

roksys commented 5 years ago

https://github.com/common-workflow-language/cwltool/blob/fe6b2ea8f8e51d26c4c6d8bdb7ebbf0837b0870b/cwltool/subgraph.py#L86

r = u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#gendata.log'

nodes.keys()
[
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/gendata_tool', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/gendata.log', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/fitdata.log', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/gendata/data', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/fitdata/result', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/plot', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/gendata/gendata.log', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/fitdata', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/fitdata/fitdata.log', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/events', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/fitdata_tool', 
  u'file:///Users/rokas/reana-demo-root6-roofit/cwl-local-run/workflow.json#main/gendata'
]
roksys commented 5 years ago

Just want to add that workflow in $graph format is generated by running $ cwltool --pack ../workflow/cwl/workflow.cwl

urljoin can't handle two url fragments #main and #gendata.log and main gets removed, which cause KeyError in get_subgraph()

https://github.com/common-workflow-language/cwltool/blob/047e69bb169e79fad6a7285ee798c4ecec3b218b/cwltool/main.py#L705

mr-c commented 5 years ago

@roksys Good find! I suggest writing code to:

  1. detect this situation
  2. instead of appending "#"+r append "/"+r
roksys commented 5 years ago

Hi @mr-c,

I think appending "/"+r instead of "#"+r will only work with workflows in $graph format, but not with ones in yaml.

mr-c commented 5 years ago

@roksys I agree (though the $graph format is also available in YAML, just not often seen that way) This is what I mean by

  1. detect this situation

Is to see if this is a $graph based document

mr-c commented 5 years ago

Fixed in https://github.com/common-workflow-language/cwltool/pull/995