benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
466 stars 142 forks source link

Pooling, big data and classic tutorial #1977

Closed luigallucci closed 3 months ago

luigallucci commented 3 months ago

Hi @benjjneb,

I wondered if the two approaches used in big data and classic tutorials should lead to differences in data pooling outcomes. Is it possible to use pool=TRUE in the big data pipeline reliably? Could this lead to a less rich ASV output than the normal approach?

Moreover, I read the latest issues about the pooling method and the chimaera removal method...if I'm using dada2 in R (not Qiime), using pooling for my dataset, should I select a different method for chimaera removal, than consensus?

luigallucci commented 3 months ago

Screenshot 2024-06-27 at 13 15 40

this was my trial with seqtab obtained through normal pipeline and seqtabA with BigData pipe.

benjjneb commented 3 months ago

Moreover, I read the latest issues about the pooling method and the chimaera removal method...if I'm using dada2 in R (not Qiime), using pooling for my dataset, should I select a different method for chimaera removal, than consensus?

IF using dada(..., pool=TRUE) THEN use removeBimeraDenovo(..., method="pooled"). IF using dada(..., pool="pseudo") THEN use removeBimeraDenovo(..., method="consensus") (the default). IF using dada(..., pool=FALSE) (the default) THEN use removeBimeraDenovo(..., method="consensus") (the default).

Is it possible to use pool=TRUE in the big data pipeline reliably?

It isn't, because the big data workflow is processing each sample independently. To use pool=TRUE, all samples have to be loaded into memory at once.

this was my trial with seqtab obtained through normal pipeline and seqtabA with BigData pipe.

These differences are small, and likely are related to low abundance/low confidence ASVs. That said, I'm not sure exactly why you would get a different result between the two. I would err on the side of the regular tutorial workflow, as that has been updated more recently.

luigallucci commented 3 months ago

Hi @benjjneb, thank you for the reply!

I was using the BigData one because is faster than the regular. As you can see the differences are not so big. Anyway, I will stick to the regular one, thank you :)