common-workflow-lab / galaxy

Fork of Galaxy (http://galaxyproject.org/) attempting to implement the CWL spec.
https://www.commonwl.org
Other
10 stars 2 forks source link

Implement a subset of the Common Workflow Language. #47

Closed jmchilton closed 2 years ago

jmchilton commented 7 years ago

This should support a subset of draft-3 and v1.0 tools.

What is holding us back from merging the progress so far?

CWL Support (Tools):

CWL Support (Workflows):

Remaining Work

The work remaining is vast and will be tracked at https://github.com/common-workflow-language/galaxy/issues for the time being.

Implementation Notes:

Tools:

Workflows:

Implementation Description:

The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with .json or .cwl and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool.

When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object.

As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc....

Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs.

Currently all File outputs are sniffed to determined a Galaxy datatype, CWL allows refinement on this and this remains work to be done.

1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.)

Implementation Links:

Hundreds of commits have been rebased into this one and so the details of individual parts of the implementation and how they built on each other are not enitrely clear. To see the original ideas behind individual features - here are some relevant links:

Testing:

% git clone https://github.com/common-workflow-language/galaxy.git
% cd galaxy
% git checkout cwl-1.0

Start Galaxy.

% GALAXY_RUN_WITH_TEST_TOOLS=1 sh run.sh

Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel.

To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to config/job_conf.xml. (Adjust the docker_sudo parameter based on how you execute Docker).

https://gist.github.com/jmchilton/3997fa471d1b4c556966

Run API tests demonstrating the various CWL demo tools with the following command.

./run_tests.sh -api test/api/test_tools_cwl.py
./run_tests.sh -api test/api/test_workflows_cwl.py
./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py

The first two execute various tool and workflow test cases manually crafted during implementation of this work. The third is an auto-generate test case class that contains Python tests for every CWL conformance test found with the reference specification.

An individual conformance test can be ran using this pattern:

./run_tests.sh -api test/api/test_cwl_conformance_v1_0.py:CwlConformanceTestCase.test_conformance_v1_0_6

Issues and Contact

Report issues at https://github.com/common-workflow-language/galaxy/issues and feel free ping jmchilton on the CWL Gitter channel.

mr-c commented 7 years ago

@jmchilton This is fantastic, thank you for the refresh!

Right now the conformance tests are running at https://ci.commonwl.org/job/galaxy-planemo-conformance/ using planemo's main GitHub branch. What's the best way to update the Jenkins job that so I run the conformance tests using this branch of Galaxy? I see your Testing instructions above, but it would be nice to run CWL conformance tests through the same interface the other implementations are tested with.

With regards to

  1. CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found.

This is already supported in v1.0: any identifier can be used in the format field ( see http://www.commonwl.org/v1.0/CommandLineTool.html#File and http://www.commonwl.org/v1.0/CommandLineTool.html#CommandInputParameter)

Example: (the # BAM comments are optional): format: http://edamontology.org/format_2572 # BAM or format: edam:format_2572 # BAM if elsewhere in the document there is

$namespaces: { edam: "http://edamontology.org/" }
$schemas: [ "http://edamontology.org/EDAM_1.16.owl" ]

I explicitly demo and promote use of EDAM for bioinformatics tools and workflows (though the example workflow repo still needs updating)

  1. For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.)

Good news: you can do this today without modifying or extending CWL. Any Galaxy output type (a file format, in CWL parlance) that isn't represented in EDAM can be added as an additional format specifier (preferably in a galaxyproject.org namespace): http://www.commonwl.org/v1.0/CommandLineTool.html#CommandInputParameter Then the Galaxy user interface can choose to display only the files that have the the Galaxy specific type(s) when a CWL description specifies both generic and Galaxy specific formats, thus giving the best user experience.

In the case of Galaxy's fastq and fastqsanger this is represented in EDAM: http://edamontology.org/format_1930 http://edamontology.org/format_1932

Hypothetical example if Galaxy's fastqsanger subtype was not represented in EDAM: format: [ http://edamontology.org/format_1930, https://galaxyproject.org/fastqsanger ] or format: [ edam:format_1930, galaxy:fastqsanger ] if elsewhere in the document there is

$namespaces: { edam: "http://edamontology.org/", galaxy: "https://galaxyproject.org/" }
$schemas: [ "http://edamontology.org/EDAM_1.16.owl", "https://galaxyproject.org/formats-release_17.01.owl" ]

Obviously it would be best for bioinformatic CWL descriptions to only use EDAM formats, but this approach means that you won't have to wait for EDAM updates to still have the best user experience in Galaxy (though EDAM releases much faster than they used to).

mr-c commented 4 years ago

@jmchilton can we get a refresh of this PR?

nsoranzo commented 3 years ago

I've rebased the cwl-1.0 branch, fixed some API tests (changes rebased in the original commits) and added 3 commits that I will PR to galaxy. Now tests are back to a decent state if anyone wants to fix more API or CWL conformance tests.

nsoranzo commented 3 years ago

Rebased, fixed a TODO by adding cwltest to pyproject.toml .

nsoranzo commented 3 years ago

Rebased for the upcoming BioHackathon Europe 2021.

nsoranzo commented 3 years ago

@jmchilton Now that in the dev branch we have moved test/unit/tools/ to test/unit/app/tools/, should we also move test/unit/tools/cwl_tools/ to test/unit/app/tools/cwl_tools/, or is there a better place?

jmchilton commented 3 years ago

Now that in the dev branch we have moved test/unit/tools/ to test/unit/app/tools/, should we also move test/unit/tools/cwl_tools/ to test/unit/app/tools/cwl_tools/, or is there a better place?

Sounds like you've picked the right place - unless there are tests in tool_util that depend on these and then we'd require more thought.

nsoranzo commented 2 years ago

I've just ticked off Download and generate (most) conformance tests instead of including everything in the Galaxy repo from the list in the description. We have gone from 1,245 files changed (241,593 added lines) to 107 files (4,756 added lines).

nsoranzo commented 2 years ago

Closing this with the aim of opening this against galaxyproject/dev ASAP.

mr-c commented 2 years ago

New location is https://github.com/galaxyproject/galaxy/pull/12909 ; thank you @nsoranzo !