genome / analysis-workflows

Open workflow definitions for genomic analysis from MGI at WUSM.
MIT License
102 stars 57 forks source link

Validate Workflow Examples with cwltool #958

Open tmooney opened 3 years ago

tmooney commented 3 years ago

I was working on another issue and discovered that the example input YAML I was using was outdated and the workflow I was running could be run with Cromwell but not cwltool. Before putting out a new release, it'd be good to polish up the examples and make sure they can be run with cwltool. (Our current automated tests are pinned to an older version; should we also test against the latest cwltool for the release?)

This is somewhat related to #645, but with a different objective.

tmooney commented 3 years ago

To clarify, this isn't only about cwltool --validate--there are some things that only come up at runtime. (A bonus if we get some examples through with cwltool is that they could be useful for #116!)

gschang commented 3 years ago

As new updates on testing with somatic_exome workflow examples, I finally got a succeeded build using the latest analysis-workflows Git repository (as of today, #1000 merged). My test job went smooth and I don't have any issue. All examples for somatic_exome workflow successfully work. Below are two points to discuss.

(1) Interpretation of results I got only one somatic variant on TP53 as a result (ENSP00000269305.4:p.Arg175His missense mutation) from the current example sequencing data. This somatic mutation is what we expect from the example data, because HCC1395 has a known mutation in TP53, p.Arg175His (c.524G>A).

(2) Next plan

This test was done with my somatic_exome YAML update. But I didn't push it yet because I saw Chris and John's work in parallel. I will keep updating them on this.

gschang commented 3 years ago

Here is a progress update on Workflow Examples, especially somatic_exome CWL workflow. As updated above, I confirmed that the current examples work without any problem for the latest somatic_exome workflow (at analysis-workflows Git). All issues that we discussed were resolved, for example vcf_santize and samtools_merge.

@johnegarza is now working on new examples for the somatic_exome workflow by cutting down to the BRCA genes, to replace the current examples. He is cleaning up huge example files (#645 Clean up example_data directory).

gschang commented 3 years ago

As the next workflow example, I am testing the latest rnaseq.cwl and rnaseq_star_fusion.cwl with the current workflow examples. I have updated the rnaseq YAML example (analysis-workflows/example_data/rnaseq/workflow.yaml) for test. Most reference inputs in the YAML files need updating with the latest ones we're using. My test job with rnaseq.cwl is running, in order to find out any broken steps if any.

gschang commented 3 years ago

I found that the current definitions/pipelines/rnaseq.cwl is broken after the merge step. Sridhar and I opened a new issue to fix this problem (#1006). In brief, we're adding a new step to sort a merged RNA-seq BAM between the merge and index_bam step.

gschang commented 3 years ago

I have submitted a new PR to fix the broken rnaseq.cwl. I merged PR #1007 to fix the problem in indexing a merged BAM. I have created the next PR #1013 to finalize this work.

I made the rnaseq.cwl workflow example up-to-date and fixed the gpbigwig_bamcoverage step, as I discussed with @sridhar0605. In sum, I validated the current rnaseq.cwl, and it is working now with this updated workflow example.