EBI-Metagenomics / ebi-metagenomics-cwl

This repository contains the CWL description of the EBI Metagenomics pipeline
21 stars 12 forks source link

V3 missing bits and bobs #63

Closed rdfinn closed 7 years ago

rdfinn commented 7 years ago

All files/directories can be found here /hps/nobackup/production/metagenomics/CWL/data/EMGv3_0/ERP009703/results/ERR770958_MERGED_FASTQ:

  1. ERR770958_MERGED_FASTQ_otu_table_hdf5.biom
  2. ERR770958_MERGED_FASTQ_qiime_assigned_taxonomy.txt
  3. uclust_ref_picked_otus
mr-c commented 7 years ago

Ah, we had only done the Krona step for the v4 assembly pipeline. Looks to be a bit messy for the v3 python pipeline. I'll pull down the rest of the simple outputs and then work on that.

What do you expect from the 'stepChunkingAndCompression' ?

rdfinn commented 7 years ago

This set of files

ERR770958_MERGED_FASTQ_I5.tsv.chunks -rw-rw-r-- 1 mitchell interpro 42386269 May 3 19:16 ERR770958_MERGED_FASTQ_I5.tsv.gz drwxrwxr-x 2 mitchell interpro 4096 May 3 19:17 sequence-categorisation -rw-rw-r-- 1 mitchell interpro 31 May 3 19:17 ERR770958_MERGED_FASTQ.fasta.chunks -rw-rw-r-- 1 mitchell interpro 72801081 May 3 19:18 ERR770958_MERGED_FASTQ.fasta.gz -rw-rw-r-- 1 mitchell interpro 45 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.faa.chunks -rw-rw-r-- 1 mitchell interpro 26119015 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.faa.gz -rw-rw-r-- 1 mitchell interpro 45 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.ffn.chunks -rw-rw-r-- 1 mitchell interpro 36459443 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.ffn.gz -rw-rw-r-- 1 mitchell interpro 43 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.faa.chunks -rw-rw-r-- 1 mitchell interpro 32057722 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.faa.gz -rw-rw-r-- 1 mitchell interpro 43 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.ffn.chunks -rw-rw-r-- 1 mitchell interpro 43573945 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.ffn.gz -rw-rw-r-- 1 mitchell interpro 76507999 May 3 19:19 ERR770958_MERGED_FASTQ_RNAFiltered.fasta.gz -rw-rw-r-- 1 mitchell interpro 0 May 3 19:19 stepChunkingAndCompression-success

On 1 Jun 2017, at 08:40, Michael R. Crusoe notifications@github.com wrote:

Ah, we had only done the Krona step for the v4 assembly pipeline. Looks to be a bit messy for the v3 python pipeline. I'll pull down the rest of the simple outputs and then work on that.

What do you expect from the 'stepChunkingAndCompression' ?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/issues/63#issuecomment-305414842, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYWTU66TR9or76mniSHPQeP-e-iqJCEks5r_mrtgaJpZM4NsQTJ.

mr-c commented 7 years ago

I believe all checked boxes are satisfied by https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/pull/64 but that remains to be tested

mr-c commented 7 years ago

@rdfinn Would you like a standalone ExpressionTool that would duplicate the file naming and path hierarchy of the original pipeline as much as possible?

rdfinn commented 7 years ago

Having file names and path replicated as much as possible would be useful for showing to others.

mr-c commented 7 years ago

@rdfinn I ran a test and am indeed able to create an arbitrary directory structure (including file name) from the outputs of a previous run.

A variation of https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/pull/64/files#diff-3d8de5ca7e00f58cf4743b9d2d366571R78 produced

./ERR770958_MERGED_FASTQ
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_I5.tsv
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_summary.go
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_RNAFiltered.fasta

using the output object from an existing Toil run.

As soon as the latest revision have completed executing I'll wire them in as well.

mr-c commented 7 years ago

Okay, #64 is much more complete: using workflows/convert-to-v3-layout.cwl the output is converted to the Python pipeline naming scheme with the follow exceptions:

I'm going to work on the taxonomy-summary / krona part next.

@rdfinn Are any of the other missing bits important?