Closed QGaelle closed 4 months ago
Q1: I know you’ve been saying that anything above ~25% of loss is "acceptable" for filtering and trimming https://github.com/benjjneb/dada2/issues/1133 but what about when I don’t have thousands of reads left like here with less than 10 000 reads for each sample? Is it still ok?
This always depends on the questions you are trying to answer, but yes it's probably fine. Lower library sizes just mean less sensitivity to relatively rare members of the community.
Q2: I have made sure that my reads still overlap after truncation and yet, it would seem that there is quite a high percentage of loss for the second sample. What could explain the high % of loss for this sample and what could explain the difference between the samples at this step?
In my experience, the most common reason to see these sorts of large variations between samples is because there is some sort of either off-target amplification, or unexpected amplicon sizes, in some of the samples. This is something you could investigate in your test samples. mergePairs(..., returnRejects=TRUE)
will return a data.frame that also includes the pairs of ASVs that failed to merge. In the samples with low merge performance, is there something different about those ASVs (considering perhaps just the forward ASVs) and the ASVs that do merge? I usually poke at that by just BLAST-ing some of the high abundance merging and non-merging pairs.
Hi @benjjneb, hi everyone,
I am currently looking for the best filtering parameters for my data and have been running some tests on a few samples. We have sent samples for sequencing in several batches, 6 runs total. The idea is to process each run separately and then merge them before the chimera removal step.
I work with coral microbiome data and have run a test on four samples from the same run.
Primers used: 16S Bactéries (Parada et al. 2016) - 411 pb (universel V4-V5) 16Sbac-515F GTGYCAGCMGCCGCGGTAA 16Sbac-926R CCGYCAATTYMTTTRAGTTT
Parameters I chose:
I used TrimLeft to remove the remaining bases of the primers. I had to remove the heterogeneity spacers first using the Trimming_HS_Metagenomics script which worked great as discussed in #1651.
I would have two questions on the output of my test. I would like to make sure I have optimized the parameters before I start the process on the whole dataset.
Output of the test:
Q1: I know you’ve been saying that anything above ~25% of loss is "acceptable" for filtering and trimming #1133 but what about when I don’t have thousands of reads left like here with less than 10 000 reads for each sample? Is it still ok?
Q2: I have made sure that my reads still overlap after truncation and yet, it would seem that there is quite a high percentage of loss for the second sample. What could explain the high % of loss for this sample and what could explain the difference between the samples at this step?
I have attached the full output document in case it could be useful: output.txt
Thanks for your help, Gaëlle