SegataLab / viromeqc

ViromeQC is a computational tool to benchmark and quantify non-viral contamination in VLP-enrihed viromes. ViromeQC provides an enrichment score for each virome. The score is calculated with respect to the expected prokaryotic markers abundances in reference metagenomes
MIT License
18 stars 1 forks source link

Support for already-qc'd fna files of reads #7

Open shiraz-shah opened 1 year ago

shiraz-shah commented 1 year ago

Dear Moreno, It would be so convenient to have support for read fna files that have already been QC'd so they can be mapped directly by bowtie2. The current QC step is slow, and allowing for this would speed up viromeQC a lot!

A lot of us have MDA amplified viromes, where a read deduplication step is necessary after read filtering and trimming to remove redundant reads prior to assembly, mapping, etc. Such deduplication is already done by other tools (such as vsearch or rmdup) and dramatically reduces the size of the read files. Following qc and deduplication, the quality scores are no longer important, so the output is normally fna files of reads instead of fastq files.

Incorporating support for qc'd fna files (bowtie2 also supports this) would dramatically increase the speed of viromeQC because the length filtering step is omitted, plus fewer reads have to be mapped by bowtie2. E.g. many of my MDA amplified virome files are often 10x smaller after passing my own qc and deduplication pipeline.

Please consider this, as it would be extremely easy for you to implement!