bio-datascience / tascCODA

tree-aggregated compositional analysis for high-throughput sequencing data
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Implementing a zero-inflated generalized Dirichlet multinomial #4

Open leighton opened 1 year ago

leighton commented 1 year ago

Hello! Thanks for this amazing tool!

I was wonder how hard it would be to implement ZIGDM in tascCODA? And if you have any plans adding it any time soon?

We work on the vaginal microbiome, which tends to have very low alpha diversity, do you think there would be any issues using tascCODA on this type of data?

Thanks again!

johannesostner commented 1 year ago

Hi @leighton, thank you for your interest in tascCODA! We aren't actively working on implementing a zero-inflated version of the DM in tascCODA, but this is an extension that we might want to add in the future. Inference for this should not be a problem, but it is not straightforward to determine how to test for credible differences in presence and abundance of a feature at the same time.

johannesostner commented 1 year ago

As for your dataset, how many taxa do you have? If the number is not too high, tascCODA should still be able to find a good result. Otherwise, you can discard the most rare taxa before running tascCODA or add a small pseudocount (e.g. 0.5) to the zeros in your data.