Open ignadb opened 4 years ago
@ignadb: How did you obtain the FASTQ files? If you use bcl2fastq
to convert your raw Illumina data, PhiX reads should end up in the 'undetermined' fraction as they don't have a sample barcode used for demultiplexing. Thus, none of the actual samples should contain any PhiX. Or do you run the undetermined reads through fastp?
@mschilli87 Thanks for your comment. Perhaps I am overcautious but I always check for Phix contamination.
That's right. Dot worry about PhiX reads, which are removed by bcl2fastq
This is an important comment that should not be ignored. PhiX can end up in the pre-processed reads for a variety of reasons. It would be great if a PhiX decontamination feature were added.
Removing PhiX is especially important for reads that are used in de novo assemblies. Which is also when one will likely be using a trimming/QC tool like fastp.
I gather the design philosophy of fastp currently is "Set good defaults so users don't have to." Removing PhiX without users having to think about it is a good idea.
In a perfect world, bcl2fastq should remove all PhiX, but a small fraction of PhiX reads get assigned to samples. In my testing, it's usually between 0-100 reads per multiplexed sample, but I have had a few examples of several thousand reads mapping to PhiX. It's more of a 'better safe than sorry' situation.
I'll also stress that NCBI will not take assemblies that have detected PhiX contamination - those contigs/scaffolds that have PhiX must be removed prior to acceptance into the NCBI assembly database.
Edit - Also, in preps that use a single index (i7) vs dual index (i5 + i7), PhiX contamination is much more of a problem. Especially as use of low cost sequencing like the SeqWell platform that can use only i7 for reduced cost. I get in the thousands of PhiX per sample in that instance.
Hi,
Thanks for developing fastp! I was wondering if it detects and removes PhiX spike-in by default?
Thanks in advance!