benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
471 stars 143 forks source link

Different sequence runs shows significantly different in alpha diversity #1565

Closed Leran10 closed 5 months ago

Leran10 commented 2 years ago

Hi,

I have 40 samples split into 4 16s sequencing runs. But after merge the 4 runs and do alpha diversity analysis, I found they show significantly statistical difference from each other.

All these conditions should be very similar. The only differences are each batch may have different sequencing death. The total reads we got are different batch by batch.

So I wondered, if this is normal or an evidence that we have something went wrong during wet lab processing?

Thanks! Leran

benjjneb commented 2 years ago

What alpha diversity metric are you analyzing here?

Leran10 commented 2 years ago

Hi, We used Observed richness and Shannon diversity:

image

benjjneb commented 2 years ago

Observed taxa is highly dependent on sequencing depth. Shannon Index is much less so, but not entirely immune to it. Could you recreate these plots from a set of samples subsampled to a constant sequencing depth? (this goes by the term "rarefy" in microbiome analysis) That would be helpful as to trying to understand how systematic the differences are between "W"s.

That said, there are not large amounts of samples in each "W". Is it possible there is a real difference between the samples in each run?

Leran10 commented 2 years ago

Thank! We used Rarefy() to subsample them to depth of 7000. And 8 samples and 97 OTUs were removed. The updated plot is as below:

image

P values of pairwise Wilcoxon tests are not significant anymore, but the medians of observed ASVs of W2 and W4 are higher than W1 and W3.

So it seems that using this method we still cannot completely get rid of their differences by sequencing batch.

And all the runs are normal cohort without any treatment.....

benjjneb commented 2 years ago

First, I would not use observed ASVs/OTUs as a metric. Looking at the Shannon results, I think it is reasonable that you should include run as a confounding factor in your subsequent statistical analyses of this data.