Diversity for subsets of the total profile

aimirza commented 3 years ago

How would you go about calculating diversity and assessing differences using DivNet for a subset of the members? For example, if you want to know the diversity of species in a specific genus or in my case, I want to compare the diversity of CAZymes (Carbohydrate-Active Enzymes), which is a subset of enzymes from the complete enzyme profiles. I'm worried that the depth and number of zeros would be estimated wrongly if based on only the subset only since the diversity of the subset may not always correlate with the total sum of the complete profile. I can imagine a scenario that sample A can have a larger sample size than sample B but have less number of CAZymes (or species of the genus of interest). If we were rarefying to account for sampling effort (excuse me for saying the R word!), I would avoid this problem by first rarefying the total enzyme profile (or species profile) and then subsetting for the CAZymes (or species in the genus of interest). I would then calculate the diversity from the subset. Is there a similar logic for Brreakway and DivNet?

adw96 commented 3 years ago

Hi @aimirza - there's nothing wrong with being interested in a diversity parameter among a subset of your categories. Whether or not this is a sensible thing to do depends on the context of your scientific question, and unfortunately I can't provide more specific guidance for your specific case.

There is no need to rarefy before passing your data to DivNet or breakaway, regardless of whether you are interested in estimating diversity within a composition of your taxa.

I hope this helps!

aimirza commented 3 years ago

I dont think you understood my question, probably because I didn't explain it well. Let me rephrase. Im not asking if subsetting is appropriate nor am I asking if I should rarefy or not when using DivNet, rarefying was just an example to explain that when adjusting for "sampling effort" by subsampling to equal sample sizes is problematic for subsets because the subset is not the full size and thus is not always proportional to the total sample size. My question is is that if total sample size or "sampling effort" should be considered when comparing diversity across samples, I am afraid that the sampling effort will be miscalculated based on the subset because the subset may not reflect the sampling effort. Will this create a problem for DivNet? I dont think it will based on what I understood of the paper but I wanted to confirm with you. But this is indeed a problem for folks who prefer to rarefy instead. I hope this clarifies :)

adw96 commented 3 years ago

I don’t believe that this will be a problem

adw96 / DivNet

Diversity for subsets of the total profile #74