Open seoldh opened 5 months ago
The pooled
chimera detection method should only be used if using pooled
denoising. It should not be used with the default denoising (independent) or with pseudo-pooling. So I would recommend as a first step adjusting the pooling modality.
(it would be good to have clearer documentation on that, or maybe a warning message when pooling method and denoising method are misaligned)
In all three cases, I used the parameter --p-pooling-method pseudo --p-chimera-method pooled
Since qiime only provides two pooling methods, independent and pseudo, I'm going to try to use R to apply pool=TRUE
rather than pseudo.
I have samples sequenced targeting the same 16S partial region from two different institutions, with about 120 and 300 samples respectively. I'm unsure how to correct for batch effects, so the first thing I did was try the following three commands to see what difference pooling makes:
qiime dada2 denoise-paired --i-demultiplexed-seqs
A_institution.qza (120 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs
B_institution.qza (300 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
qiime dada2 denoise-paired --i-demultiplexed-seqs
A+B_institution.qza (420 samples)--p-trunc-len-f N --p-trunc-len-r M --p-trim-left-f L --p-trim-left-r O --p-pooling-method pseudo --p-chimera-method pooled
In (1), chimeras were detected and filtered out, but in (2) and (3) cases, chimeras were not detected at all in any of the samples, i.e., all samples in A_institution (120samples) > the count after merging in (1) ≈ the count after merging in (3) = the count after chimera removal (3) > the count after chimera removal in (1).
Are there any parameters I need to adjust for chimera detection when using large dataset? Or could there be other causes? The sequencing quality plots for raw data from two institutions are similar, but institution A has an average read count of 40,000 while institution B has an average read count of 170,000, a difference of about 4x.