Closed rdfinn closed 7 years ago
Ah, we had only done the Krona step for the v4 assembly pipeline. Looks to be a bit messy for the v3 python pipeline. I'll pull down the rest of the simple outputs and then work on that.
What do you expect from the 'stepChunkingAndCompression' ?
This set of files
ERR770958_MERGED_FASTQ_I5.tsv.chunks -rw-rw-r-- 1 mitchell interpro 42386269 May 3 19:16 ERR770958_MERGED_FASTQ_I5.tsv.gz drwxrwxr-x 2 mitchell interpro 4096 May 3 19:17 sequence-categorisation -rw-rw-r-- 1 mitchell interpro 31 May 3 19:17 ERR770958_MERGED_FASTQ.fasta.chunks -rw-rw-r-- 1 mitchell interpro 72801081 May 3 19:18 ERR770958_MERGED_FASTQ.fasta.gz -rw-rw-r-- 1 mitchell interpro 45 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.faa.chunks -rw-rw-r-- 1 mitchell interpro 26119015 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.faa.gz -rw-rw-r-- 1 mitchell interpro 45 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.ffn.chunks -rw-rw-r-- 1 mitchell interpro 36459443 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_unannotated.ffn.gz -rw-rw-r-- 1 mitchell interpro 43 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.faa.chunks -rw-rw-r-- 1 mitchell interpro 32057722 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.faa.gz -rw-rw-r-- 1 mitchell interpro 43 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.ffn.chunks -rw-rw-r-- 1 mitchell interpro 43573945 May 3 19:18 ERR770958_MERGED_FASTQ_CDS_annotated.ffn.gz -rw-rw-r-- 1 mitchell interpro 76507999 May 3 19:19 ERR770958_MERGED_FASTQ_RNAFiltered.fasta.gz -rw-rw-r-- 1 mitchell interpro 0 May 3 19:19 stepChunkingAndCompression-success
On 1 Jun 2017, at 08:40, Michael R. Crusoe notifications@github.com wrote:
Ah, we had only done the Krona step for the v4 assembly pipeline. Looks to be a bit messy for the v3 python pipeline. I'll pull down the rest of the simple outputs and then work on that.
What do you expect from the 'stepChunkingAndCompression' ?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/issues/63#issuecomment-305414842, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYWTU66TR9or76mniSHPQeP-e-iqJCEks5r_mrtgaJpZM4NsQTJ.
I believe all checked boxes are satisfied by https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/pull/64 but that remains to be tested
@rdfinn Would you like a standalone ExpressionTool
that would duplicate the file naming and path hierarchy of the original pipeline as much as possible?
Having file names and path replicated as much as possible would be useful for showing to others.
@rdfinn I ran a test and am indeed able to create an arbitrary directory structure (including file name) from the outputs of a previous run.
A variation of https://github.com/ProteinsWebTeam/ebi-metagenomics-cwl/pull/64/files#diff-3d8de5ca7e00f58cf4743b9d2d366571R78 produced
./ERR770958_MERGED_FASTQ
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_I5.tsv
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_summary.go
./ERR770958_MERGED_FASTQ/ERR770958_MERGED_FASTQ_RNAFiltered.fasta
using the output object from an existing Toil run.
As soon as the latest revision have completed executing I'll wire them in as well.
Okay, #64 is much more complete: using workflows/convert-to-v3-layout.cwl
the output is converted to the Python pipeline naming scheme with the follow exceptions:
taxonomy-summary
directory yet${step}-success
flag files*.chunks
charts
directoryI'm going to work on the taxonomy-summary / krona part next.
@rdfinn Are any of the other missing bits important?
All files/directories can be found here
/hps/nobackup/production/metagenomics/CWL/data/EMGv3_0/ERP009703/results/ERR770958_MERGED_FASTQ
:[x] ERR770958_MERGED_FASTQ.fasta.submitted.count - I do not think that we have counts of the seq-prep-ed file
[x] Missing QC stats, should be 8 files. You are generating this
[x] Missing summary files of counts ERR770958_MERGED_FASTQ_summary.ipr, ERR770958_MERGED_FASTQ_summary
[x] taxonomy-summary files. I think we are generating the krona files? Missing 3. kingdom-counts.txt krona.html krona-input.txt
[x] Missing the following files from the taxonomy steps
[x] sequence categorisation step outputs in sequence-categorisation are missing. We are generating many of the fasta files.
[x] tRNA sequences
[ ] stepChunkingAndCompression step is missing?
[x] standalone
ExpressionTool
to re-write all the CWL outputs to match the Python workflow outputs