common-workflow-library / legacy

Deprecated
https://github.com/common-workflow-library/bio-cwl-tools
Apache License 2.0
100 stars 62 forks source link

Simple example from a Makefile. #1

Closed lindenb closed 9 years ago

lindenb commented 9 years ago

Hi

I'd like to know if I can easily convert a Makefile based workflow to CWL. For example, for the simple Makefile, want would be equivalent CWL file please ?


database.dna :  seq1.rna seq2.rna seq3.rna
    cat seq1.rna seq2.rna seq3.rna | tr "U" "T" > database.dna && echo "Done"

seq3.rna : 
    echo "AUGAAGACUGACUCGAUCGAUCG" > seq3.rna

seq2.rna : 
    echo "AUGAAGACUGACUCGAUCGAUCG" > seq2.rna

seq1.rna : 
    echo "AUGAAGACUGACUCGAUCGAUCG" > seq1.rna
tetron commented 9 years ago

Hi @lindenb, sorry for the delay.

At 111 lines compared to 11 lines the CWL solution is more complex than the original makefile, but let me point out a few details.

Also general advantages of CWL over plain Makefiles:

dna.cwl:

#!/usr/bin/env cwl-runner

- id: "#rnaseq"
  class: CommandLineTool
  inputs:
    - id: "#sequence"
      type: string
      inputBinding: {}
  outputs:
    - id: "#seqout"
      type: File
      outputBinding:
        glob: rnaseq
  baseCommand: echo
  stdout: rnaseq

- id: "#cat"
  class: CommandLineTool
  inputs:
    - id: "#sequences"
      type:
        type: array
        items: File
      inputBinding: {}
    - id: "#catfilename"
      type: string
  outputs:
    - id: "#catout"
      type: File
      outputBinding:
        glob:
          engine: cwl:JsonPointer
          script: /job/catfilename
  baseCommand: cat
  stdout:
    engine: cwl:JsonPointer
    script: /job/catfilename

- id: "#tr"
  class: CommandLineTool
  inputs:
    - id: "#trinput"
      type: File
    - id: "#from"
      type: string
      inputBinding:
        position: 1
    - id: "#to"
      type: string
      inputBinding:
        position: 1
    - id: "#filename"
      type: string
  outputs:
    - id: "#trout"
      type: File
      outputBinding:
        glob:
          engine: cwl:JsonPointer
          script: /job/filename
  baseCommand: tr
  stdin:
    engine: cwl:JsonPointer
    script: /job/trinput/path
  stdout:
    engine: cwl:JsonPointer
    script: /job/filename

- id: "#main"
  class: Workflow
  inputs:
    - id: "#rna"
      type:
        type: array
        items: string
  outputs:
    - id: "#outfile"
      type: File
      source: "#combine_sequences.catout"

  requirements:
    - class: ScatterFeatureRequirement

  steps:
    - id: "#get_sequences"
      run: {import: "#rnaseq"}
      scatter: "#get_sequences.sequence"
      inputs:
        - { id: "#get_sequences.sequence", source: "#rna" }
      outputs:
        - { id: "#get_sequences.seqout" }

    - id: "#translate_sequences"
      run: {import: "#tr"}
      scatter: "#translate_sequences.trinput"
      inputs:
        - { id: "#translate_sequences.trinput", source: "#get_sequences.seqout" }
        - { id: "#translate_sequences.from", default: "U" }
        - { id: "#translate_sequences.to", default: "T" }
        - { id: "#combine_sequences.filename", default: "dna" }
      outputs:
        - { id: "#translate_sequences.trout" }

    - id: "#combine_sequences"
      run: {import: "#cat"}
      inputs:
        - { id: "#combine_sequences.sequences", source: "#translate_sequences.trout" }
        - { id: "#combine_sequences.catfilename", default: "database.dna" }
      outputs:
        - { id: "#combine_sequences.catout" }

dna-job.json

{
  "rna": ["AUGAAGACUGACUCGAUCGAUCG", "AUGAAGACUGACUCGAUCGAUCG", "AUGAAGACUGACUCGAUCGAUCG"]
}
lindenb commented 9 years ago

many thanks! FYI: I also got an answer from https://www.biostars.org/p/152226/

averagehat commented 8 years ago

How can this workflow be run?

~/cwl$./dna.cwl dna-job.json 
/home/anaconda/bin/cwl-runner 1.0.20160115132942
Tool file contains graph of multiple objects, must specify one of #rnaseq, #cat, #tr, #main
tetron commented 8 years ago

Hi @averagehat, try this:

~/cwl$ cwl-runner dna.cwl#main dna-job.json 
averagehat commented 8 years ago

Got a run-time error:

~/cwl$ cwl-runner dna.cwl#main dna-job.json 
/home/anaconda/bin/cwl-runner 1.0.20160115132942
[job 139953823896976] /tmp/tmpLlmbLe$ echo AUGAAGACUGACUCGAUCGAUCG > /tmp/tmpLlmbLe/rnaseq
[job 139953823896400] /tmp/tmppIdu4F$ echo AUGAAGACUGACUCGAUCGAUCG > /tmp/tmppIdu4F/rnaseq
[job 139953823897872] /tmp/tmpcC4I0i$ echo AUGAAGACUGACUCGAUCGAUCG > /tmp/tmpcC4I0i/rnaseq
Unexpected exception
Traceback (most recent call last):
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/workflow.py", line 443, in job
    for t in self.embedded_tool.job(joborder, basedir, functools.partial(self.receive_output, output_callback), **kwargs):
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/draft2tool.py", line 124, in job
    j.stdin = builder.do_eval(self.tool["stdin"])
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/builder.py", line 165, in do_eval
    context=context, pull_image=pull_image)
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/expression.py", line 135, in do_eval
    return sandboxjs.interpolate(ex, jshead(r.get("expressionLib", []), rootvars))
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/sandboxjs.py", line 131, in interpolate
    e = execjs(scan[w[0]+1:w[1]], jslib)
  File "/home/anaconda/lib/python2.7/site-packages/cwltool/sandboxjs.py", line 20, in execjs
    stderr=subprocess.PIPE)
  File "/home/anaconda/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/home/anaconda/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
Workflow error:
  [Errno 2] No such file or directory 

Maybe it's looking for an rnaseq tool, which I don't have installed?

mr-c commented 8 years ago

Hello @averagehat This example now lives in https://github.com/common-workflow-language/workflows/tree/master/workflows/make-to-cwl

https://github.com/common-workflow-language/workflows/blob/master/workflows/make-to-cwl/README.md shows that the command to run is now cwl-runner dna.cwl#main rna.json

Does that work for you?

sjackman commented 7 years ago
Tool file contains graph of multiple objects, must specify one of #tool1, #tool2, #main

@mr-c Is it possible to declare in the CWL file that the default target object is #main, so that it's possible to simply run cwl-runner pipeline.cwl rather than cwl-runner 'pipeline.cwl#main'

sjackman commented 7 years ago

I see that it's possible to create a separate JSON file:

{ "cwl:tool": "pipeline.cwl#main" }

Is it possible to put that declaration in the CWL file?

tetron commented 7 years ago

Cwltool used to do that, we took it out because it wasn't supported consistently across implementations and wasn't written down in the spec. It missed the cut for v1.0.1 but maybe could be added in a v1.0.2.

sjackman commented 7 years ago

That'd be great. Thanks, Peter.