Closed jorondo1 closed 2 years ago
Thanks so much @jorondo1 -- I have some suspicions about what's happening. Two questions:
I think I can diagnose given the above information.
The counts matrix is a Sourmash output. Here is a subset of the phyloseq object I am using as input. I have not done any filtering (except here for this subset), the counts are as-is. It looks like this:
GQ1 GQ2 GQ32 GQ33 GQ3 GQ34 GQ35 GQ4 GQ36 GQ37 GQ5 GQ6 GQ11 GQ12 GQ40
1 [Eubac… 130 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 [Eubac… 1115 10007 0 0 7426 0 109 251 0 0 12743 9898 0 80 0
3 [Eubac… 169 111 0 0 0 0 0 0 0 0 0 0 0 0 0
4 Actino… 287 214 0 0 198 0 0 0 0 0 0 211 137 857 0
5 Actino… 323 182 0 0 0 0 0 0 0 0 0 0 0 304 0
6 Actino… 75 0 0 0 0 0 0 0 0 0 0 0 0 359 0
7 Actino… 2333 28694 0 0 8682 0 0 2403 0 0 7657 9792 141 64 0
8 Actino… 562 171 0 0 196 0 0 0 0 0 213 0 70 215 0
9 Actino… 3796 1403 0 0 3390 0 0 559 0 0 157 124 1215 1414 0
10 Actino… 860 2216 0 0 3446 0 0 73 0 0 1970 967 1106 1233 0
Thanks a lot for your time!
Interesting -- it looks like you have very few biological units that are observed infrequently. What breakaway is (reasonably) inferring from this is that there are few unobserved biological units, which is why breakaway's estimates are the same as your plugin estimates. Basically if your data structure doesn't suggest that you have anything that's rare, it will predict that you have nothing missing! This is definitely the intended behavior of breakaway, so I'm going to close it as an issue. I hope that helps answer your question!
That said, given the extremely strong correlation between depth and observed richness, if you wanted to fit a model for richness adjusting for depth, you might consider fitting a model like lm(sample_richness ~ your_covariates + depth)
. I don't recommend this often but given that you don't have the ability to estimate the # of rare units (min hash sketches?), this might be the best you can do in this setting.
Next time I chat with Taylor and Titus I might ask them about whether sourmash can detect rare/low abundance min hash sketches, since I'm ignorant on this.
Hi @adw96 !
I thought I posted this already but I can't find my issue anymore, sorry in advance if I double-posted. The required command outputs are here.
I am running the STAMPS2022 breakaway tutorial on my data and the comparison between breakaway adjustment and plug-in richness yields identical plots, even though I have a significant (orders-of-magnitude) differences in sequencing depth between my two sample types (Saliva and Feces),
I do get a warning after running the breakaway function, but from reading previously posted issues it doesn't seem to be cause for alarm:
And the contents of the two objects look like this:
Can you help me figure this out ?
Thanks in advance