benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
459 stars 142 forks source link

Merging two datasets trimmed with different criteria #1992

Closed andrebolerbarros closed 1 month ago

andrebolerbarros commented 1 month ago

Hey everyone,

I do not have a concrete example but, I was wondering if merging two datasets, each from its sequencing run (in this scenario, I am running dada2 separately for each run), this could not impact the ASV. Specifically in the process of merging ASVs.

Thanks,

benjjneb commented 1 month ago

Simple merging (e.g. by mergeSequenceTables) will require that the ASVs from each dataset start and end at the same position. So, if the differences in trimming/truncating causes the ASVs in one dataset to start or end at a different position than those in the second dataset it is an issue.

In the specific case of paired-end data that is being merged, after merging (assuming it was successful), any differences in truncation lengths are removed. So, differences in truncLen between paired-end datasets that otherwise are using the same primer set usually will not impact merging of datasets.

andrebolerbarros commented 1 month ago

Thanks @benjjneb

So, for single-end data, merging datasets from different runs, for example, may be problematic. In that sense, is it preferable to run both datasets merged and just include the batch information downstream? But this may lead, worst case scenario, of having very different community structures solely by differences in trimming criteria. What should one do to solve this issue?

benjjneb commented 1 month ago

Truncating both sets of reads at the same position is the preferred solution. This is straightforward for single-end data deriving from the same starting primer -- just truncate the longer ASV table at the length of the shorter ASV table. Then you can use the collapseNoMismatch function on the merged table to clean up any variants that used to differ only in the positions that have been truncated away.

andrebolerbarros commented 1 month ago

Perfect, amazing! Thank you @benjjneb