KasperSkytte / ampvis2

Tools for visualising microbial community amplicon data
https://kasperskytte.github.io/ampvis2/
GNU General Public License v3.0
67 stars 24 forks source link

stats on ordination #145

Closed emihoe closed 2 years ago

emihoe commented 2 years ago

Hello, I would like to know whether or not my RDA, CCA and PCoA clusters are significant. Does AmpVis2 have the option to calculate this? Thank you, Emily

KasperSkytte commented 2 years ago

Hi there

Those methods are not statistical tests. But you can overlay a fitting of environmental variables using vegans envfit based on permutations. Try setting envfit_factor or envfit_numeric. If you want to do hypothesis tests you can use PERMANOVA or ANOSIM. More info and guides here https://sites.google.com/site/mb3gustame/

emihoe commented 2 years ago

Hello, Sorry I wasn't very clear. Yes I know these methods are not statistical tests I was wondering how I can take the AmpVis2 ordination results into something like vegan to complete an ANOSIM or PERMANOVA. I'm not very sure how to make a dataframe from the ampvis2 object that I can then analyse with vegan. Thank you for the help :)

KasperSkytte commented 2 years ago

Hi again

Vegan is used under the hood, amp_ordinate is essentially just a wrapper function. So if you use the detailed_output option it will return a list with extra info, one of which is the vegan model, another the envfit model, and also scree plots etc.

emihoe commented 2 years ago

Thank you! I managed to generate the detailed_output and I apologise for the annoying questions (I'm trying to teach myself and have no one else to ask), before Calypso collapsed I used this to work out level of significance for my RDAs, and calculate Adonis and ANOISM.

I've tried using the dataframe from the detailed output with vegan but still run into errors. I haven't managed to find a helpful tutorial for this integration step, do you have any suggestions?

anosim((pcoa_42[["dsites"]]), grouping, permutations = 999, distance = "bray", strata = NULL, parallel = getOption("mc.cores")) Error in vegdist(x, method = distance) : input data must be numeric

adon.results<-adonis((pcoa_42[["dsites"]]) ~ Treat_sex, method="bray",perm=999) Error in eval(predvars, data, env) : object 'Treat_sex' not found

KasperSkytte commented 2 years ago

You just need to give adonis and anosim the data in the ampvis2_obj$abund data.frame. You don't need to use the output from amp_ordinate. You may need to transpose the data.frame first

KasperSkytte commented 2 years ago

You got it spinning?

KasperSkytte commented 2 years ago

Closing due to inactivity. Feel free to reopen.

emihoe commented 2 years ago

Sorry, for the slow reply.

I managed to calculate Adonis and ANOSIM of bray-curtis distance matrix from my phyloseq object converted to Vegan format, but what I really want to do it analyse the significance of the RDA plot. I want to tell the reviewer that based on RDA the clustering is significant.

If I take the data from the ampvis2_obj$abund into a data frame won't I just get the same thing?

How do I tell if the RDA clustering is significant?

Is there a way to make a data.frame from this ordination? How do I do this specifically?

Sorry for being a bit dense, I'm still trying to teach myself R.

Thank you!

KasperSkytte commented 2 years ago

Hi again

RDA itself is not a statistical hypothesis test. It's purely explorative and only highlights a portion of the variance in the data. It only shows the variance along two dimensions (eigenvectors), not the full n-1 dimensions (n is the number of OTUs/ASVs). So if you want to test for differences between groups of samples, it's best to use a test like ANOSIM or PERMANOVA on the abundance table itself.

To extract the same sites (samples) and species (OTUs, if enabled) scores (coordinates) on the two most significant dimensions shown in a RDA plot produced from amp_ordinate, you can use the plot data plot$plot_env$sitescores and plot$plot_env$speciesscores. It's the same as when setting detailed_output = TRUE and running vegan::scores(ord_obj$model, display = "sites", choices = c("RDA1", "RDA2")) etc. It might be handy to filter the data a bit, which is done by default by amp_ordinate by the filter_otus() function, see details.