EBI-Metagenomics / ebi-metagenomics-cwl

This repository contains the CWL description of the EBI Metagenomics pipeline
21 stars 12 forks source link

Rfam library files #78

Open aperz opened 6 years ago

aperz commented 6 years ago

Hello,

First of all - thank you, @mr-c, for the response to my previous problem!

I've been further trying to run emg workflows, specifically emg-pipeline-v4-paired.cwl, on my machine. I have run into several more issues, most of which seem too trivial to formulate as formal issues. If there is a preferred channel of communication, please, let me know.

Some of the problems might reveal my unfamiliarity with CWL, so I apologise, and I'm working on it! Seems like it's really quite an elegant way to deal with patchworks of wildly different tools, commonly known as pipelines.

Here's what I had problems with so far:

ISSUE:

workflows/emg-pipeline-v4-paired-job.yaml specifies Rfam libraries contained within directories: "other" (e.g. .../CWL/data/libraries/Rfam/other/Archaea_SRP.cm), "ribosomal" (e.g. .../CWL/data/libraries/Rfam/ribosomal/RF02542.cm). I downloaded the Rfam database from ftp://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.tar.gz, and it does not contain a corresponding directory structure.

SOLUTION: I have not found any so far.

ISSUE:

MGRAST_base.py script used in tools/qc-stats.cwl is missing, can't find it online.

ISSUE: At step trim_quality_control when running workflows/emg-qc-paired.cwl:

[job trim_quality_control] /tmp/tmp688nyarp$ /bin/sh \
    -c \
    'java' 'org.usadellab.trimmomatic.Trimmomatic' 'PE' '-trimlog' 'trim.log' '-threads' '8' '-phred33' '/tmp/tmpvkjt1pef/stg827d3176-4f1b-4179-8404-4b46397fff43/merged_with_unmerged_reads' 'merged_with_unmerged_reads.trimmed.fastq' 'LEADING:3' 'TRAILING:3' 'SLIDINGWINDOW:4:15' 'MINLEN:100'
Error: Could not find or load main class org.usadellab.trimmomatic.Trimmomatic

EXPLANATION: On my computer / a different version of Trimmomatic installs as a bash executable, which then calls java.

SOLUTION: change: baseCommand [ java, org.usadellab.trimmomatic.Trimmomatic ] to: baseCommand [ trimmomatic ]

ISSUE: Is Trimmomatic output log file saved in a directory when it can be found?

Error collecting output for parameter 'output_log':
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: Traceback (most recent call last):
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: 
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3:   File "/P/cwl/venv/lib/python3.6/site-packages/cwltool/command_line_tool.py", line 707, in collect_output
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3:     raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns))
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: 
ebi-metagenomics-cwl/tools/trimmomatic.cwl:221:3: cwltool.errors.WorkflowException: Did not find output file with glob pattern: '['trim.log']'

SOLUTION: For now, I just commented out the output_log section of outputs in trimmomatic.cwl (lines 221-233). Guess that might break something, I'm sure there's a better solution.

mr-c commented 6 years ago

@aperz Thanks for your issue. This repository is still evolving and as you've noticed not all steps have containers or their scripts uploaded yet.

For Trimmomatic, set the CLASSPATH to the directory containing the jar. See https://github.com/EBI-Metagenomics/ebi-metagenomics-cwl/blob/886df9de6713e06228d2560c40f451155a196383/workflows/ebi-setup.sh#L3 for an example