adw96 / breakaway

Species richness with high diversity
68 stars 18 forks source link

Richness estimates confidence intervals very high. #64

Closed mestaki closed 5 years ago

mestaki commented 5 years ago

Hey folks, I've got a simple nested 2x3 design of mouse gut tissue which I have denoised with dada2 (with pooled=True) and then phyloseq+breakawy to estimate richness:

plot(ps.richness.breakaway, ps, color="group") 
# `group` here is the merged version 2x3=6 factors. Couldn't figure out plotting the 2x3 design.

image

First question, I assumed the error bars are 95% CI but I actually couldn't find this in the packaged DivNet vignette. Breakaway didn't come packaged with a vignette. Is this correct? Would be good to know to describe in a publication. Second, as you can see the CI are rather highly variable, in some instances reaching almost 50k which biologically doesn't make sense to me..is this typical? I thought those instances may be the result of low sequencing depths, but I couldn't confirm this pattern given the sequencing depth of the samples.

 DM01  DM02  DM03  DM04  DM05  DM06  DM07  DM08  DM09  DM10  DM11  DM12  DM13  DM14  DM15  DM16 
 6715   145 11006  3560 11668  3981  6603 10553  2738 11676 20038 13292   537  4628 46729 23484 
 DM18  DM19  DM20  DM21  DM22  DM23  DM24  DM26  DM27  DM28  DM29  DM30  DM32  DM33  DM37  DM38 
18304  5885 21195 19050 14348 13088 19430 19489  9656 18852  9507  6645 12403  7445 19437 24984 
 DM39  DM40  DM42  DM43  DM44  DM45  DM46  DM47  DM48  DM49  DM53  DM55 
 7130 13516 15407 11437 42219 26092  6607  7494 12115  7186  4701  2766 ```

Any thoughts on how to explain this and more importantly if these values are to be trusted when this high?

Thanks!

adw96 commented 5 years ago

Hey Bod @mestaki ! So sorry again for the delay -- but it's updates time for breakaway 😺 so expect some answers to your questions and issues soon.

Indeed these are 95% confidence intervals. Because breakaway is a wrapper for multiple species richness estimates, the level of the set is actually listed within those functions. e.g. In the documentation for kemp is says ci: An asymmetric 95\% confidence interval for diversity. Sorry that's not easier to find.

To answer your second question, the upper endpoints of species richness estimates are indeed typically high. Here are couple of reasons for this

There are a couple of reasons why this isn't for cause for concern:

To answer your final questions -- yes this is typical, it is reliable, and it is correct 😸 That's why you collected lots of samples!

Let me know if you have any questions about how to run betta for inference on diversity.

Amy