chanzuckerberg / idseq-workflows

Portable WDL workflows for IDseq production pipelines
https://idseq.net/
MIT License
31 stars 12 forks source link

short-read-mngs RunAssembly: contig length filter #127

Closed mlin closed 3 years ago

mlin commented 3 years ago

Apply a 100-nucleotide length filter to the contigs as they emerge from SPAdes, to reduce the burden on subsequent steps like BlastContigs. Shorter contigs can make up a majority of SPAdes output currently (sample-dependent), yet seem of minimal value considering they're shorter than the input reads.

The downfiltered contigs are used for all subsequent steps (including the Bowtie2 realignment of reads to contigs immediately following SPAdes). The original set is copied into a new workflow output, assembly_out_assembly_contigs_all_fasta, which remains unused.