benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
469 stars 142 forks source link

Best method of comparing samples composition #1173

Closed dikiprawisuda closed 3 years ago

dikiprawisuda commented 3 years ago

Hi, I am sorry if this is not a technical question.

I am looking for a way to comparing two among many of my samples. I want to know whether they are different or not.

So this sample is coming from two different mice but close in age (weeks).

My hypothesis is that they have roughly similar microbiome or ASV composition.

I have tried Wilcox and T test with Subset dataframe from otutable/seqtab.nochim, they resulting in different results, probably due to high number of zeroes. T test gave a desirable result, but saphiro test said it is not normal data.

Is there a better way of telling whether they are different or not?

Thank you.

Best,

benjjneb commented 3 years ago

Are you trying to do a "whole-community" comparison, i.e. is this community different from that community? If so, I would look at the Permanova method, implemented in the adonis/adonis2 functions in the vegan R package.

If you are doing taxa-by-taxa differential abundance testing, then a t-test or Wilcoxon test are not unacceptable choices. There is a much larger literature on differential abundance testing on microbiome data though, for example https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531 but many others as well.

dikiprawisuda commented 3 years ago

Dear Sir @benjjneb, Thank you for your reply. May I know why that those two test are not acceptable? What keyword should I am looking for?

I think it is not taxa-by-taxa, please correct me if I am wrong, it is sample-by-sample. Not exactly every samples, only between two samples. For example, for GlobalPatterns dataset of phyloseq, it is like the following: data.frame(otu_table(GlobalPatterns)) and then, I want to know whether CL3 and CC1 are different. So, I tried the following p.otutable <- data.frame(otu_table(GlobalPatterns)) and thus the modest attempt of t.test(p.otutable$CL3,p.otutable$CC1) and wilcox.test(p.otutable$CL3,p.otutable$CC1)

Thank you for the paper, will try to deep-read it. Meanwhile, is it acceptable if I use PCoA bray-curtis for telling their difference?

Thank you, Best

benjjneb commented 3 years ago

The test you describe is not comparing the samples. It is comparing the distribution of relative abundance in each sample, with the underlying (and incorrect) assumption that each of the relative abundances is a separate draw from a common distribution.

You can use those tests to compare the distribution of relative abundances of a single taxon across samples in group 1 and group 2, but you can't use it in the way that you are.

is it acceptable if I use PCoA bray-curtis for telling their difference?

Sure, this is a very common way to do an exploration of differences/similarities between samples.