Open methylnick opened 6 years ago
Adding another tool that relates to this thread, http://www.bcgsc.ca/platform/bioinfo/software/biobloomtools
I think this is a tool that can sample fastq file and blast to see different species contamination
I've been experimenting with mash screen
: https://mash.readthedocs.io/en/latest/tutorials.html#screening-a-read-set-for-containment-of-refseq-genomes
For all these tools you need some kind of reference database(s) to screen against, which can involve different amounts of mucking around to setup properly depending on the tool (eg Bowtie indices vs pre-computed Bloom filter databases vs pre-computed 'sketch' indices).
Pros: single (relatively) small reference database (RefSeq genomes) is provided, simplifying setting up a pretty comprehensive screening db. Pretty fast.
Cons: no MultiQC plugin (yet)
Pros: MultiQC support. Database download is now simple (but huge and slow) since they've added the --get_genomes
option (used to be more mucking around, which I why I previously felt RNAsik should explore another option). References databases provided by fastq_screen
are probably better than the mash
RefSeq database for routine screening since the fastqc_screen
databases are built for the task and include common contaminants, adapters etc as well as model organisms.
Cons: Precomputed reference database might not be as comprehensive as the mash
ReqSeq database for detecting more obscure organisms. Bowtie / BWA dependency - not a big issue now we are recommending conda
as the supported deployment method.
Pros: MultiQC support.
Cons: As far as I can tell, no precomputed reference databases are provided.
Another option to consider (designed more for human data with potential microbial contamination):
http://software.broadinstitute.org/pathseq/ https://software.broadinstitute.org/gatk/documentation/article?id=10913
Doesn't appear to be in bioconda, so probably a non-starter :/
Thinking of adding a sample contamination check into the pipeline to get an assessment on sample purity.
Will become an increasing issue for those playing in microbiome/host genomics. But also for xenograft experiments (human/mouse) as examples.
One tool I have used is fastq screen https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/
as a suggestion, I am sure there are other equivalent tools.