gavinmdouglas / FuncDiv

R package for computing alpha and beta contributional diversity metrics
10 stars 0 forks source link

rarefaction for alpha diversity? #7

Closed marwa38 closed 1 year ago

marwa38 commented 1 year ago

Hi Many thanks again for the package. I have already ran my alpha diversity analysis (after rarefaction to the first quartile) and would like to add more info from the functional diversity side of view, I think I need to rarefy again for the contributional alpha diversity as a solution for variation in total diversity, if you don't mind adding rarefaction as second solution with the code that possibly could applied to alpha_out. For phyloseq object, I used rarefy_even_depth(phyloseq, sample.size = Q1)

something like this for contributional alpha diversity output?

alpha_out_rarefied <- lapply(alpha_out,
                             function(X) {
                               first_quartile <- quantile(X, probs = 0.25, **na.rm = T**)
                               data.frame(X / first_quartile)
                             })

For the richness index, I got inf values: image

Thanks :)

gavinmdouglas commented 1 year ago

Hi again,

I would prefer to leave it to the user to rarefy their input table (if they so desire) prior to computing contributional diversity.

If I understand what you're trying to do, it looks like you're computing the quartile based on an entire table of alpha diversity values (across samples). This definitely wouldn't make sense to normalize by, as there will of course be variation in depth across samples. One post-computation normalization you could do is to divide the alpha diversity values by the total richness per sample (i.e., convert the values to the proportion of taxa in a sample that encode a function).

Cheers,

Gavin

marwa38 commented 1 year ago

Do you think if I already ran rarefaction on my alpha diversity metrics, is it better to run the same normalisation method when running functional (contributional) alpha diversity?

gavinmdouglas commented 1 year ago

Do you mean that you already subsampled your table of abundances to the same number of counts for each sample? If so, then yes you could also use this subsampled table for computing contributional alpha diversity -- you wouldn't need to re-do the sub-sampling. It's an open question what the best way of normalizing this type of data is (whether to rarefy or to normalize based on some measure of the alpha diversity based on all taxa for instance).

marwa38 commented 1 year ago

Thanks @gavinmdouglas I know about rarefy or not to rarefy My point is that is that I used rarefy_even_depth(phyloseq, sample.size = Q1) to rarefy my input data before alpha diversity analysis using estimate_richness() Should I use the same method of sub-sampling also for functional alpha diversity alph_div_contrib() knowing that my inputs are directly from picrust2? I will analyze my analysis based on both alpha and functional alpha diversity that is why I am asking. So I have the richness is higher X group that Y group at alpha diversity level, need to see how I will make sense of it along with functional alpha diversity but I need to plot first. It would be helpful if you could provide a sample example of ComplexHeatmap or ggplot for plotting the outputs. Thanks

gavinmdouglas commented 1 year ago

Sorry again for my late reply.

Yes, it would likely be better to base all diversity analyses on the same rarefied table (if you want to rarefy at all). This would help maintain a consistency across your analyses in the same project, and make the contributional alpha and standard alpha diversity metrics directly comparable.

However, I think it would be reasonable to transform contributional alpha diversity data (for a given sample) with a standard approach like mean-centring and scaling by the standard deviation, even without rarefying. I haven't explored how robust this type of transformation would be to whether the data is rarefied or not though, which would be useful to know.

I'll add in an example of how to make a basic heatmap with ComplexHeatmap to the tutorial, thanks for the suggestion!

Cheers,

Gavin

marwa38 commented 1 year ago

Thanks for your input @gavinmdouglas That's great! What would be a possible code to run to rarefy the tables (abun_tab )? I don't think that the func_tab needs to be rarefied? which columns in contrib_tab?

using rarefy_even_depth() for input phyloseqobject in the standard alpha diversity. link here ; this function uses the standard R sample function to resample from the abundance values in the otu_table** component of the first argument, physeq.

summary(sample_sums(phyloseq))
rarefy_even_depth(phyloseq, sample.size = first quartile)

I am trying to make both input rarefied so that I compare both standard and functional alpha diversity.

gavinmdouglas commented 1 year ago

Hi @marwa38,

You could use rrarefy from the vegan R package: https://rdrr.io/rforge/vegan/man/rarefy.html.

No, the function copy number table is essentially genome annotation information, so you would not want to rarefy that.

The contrib_tab format is less convenient to rarefy, so I would convert it to separate abun and function copy tables (see the FuncDiv::contrib_to_multitab function).

marwa38 commented 1 year ago

MANY THANKS

marwa38 commented 1 year ago

I leave the rarefaction codes here that might be helpful to anyone:.

library(vegan)
set.seed(50)
# the other option is to use rarefied input as in case of standard alpha diversity first to make both comparable (standard and functional alpha div)
summary(rowSums(t(abun_tab.intes))) # Q1 3073
# Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 641    3073    9394   49384   67956  321509 
abun_tab.intes.t <- t(abun_tab.intes)

# exclude rows with a sum of counts smaller than the rarefaction level 
abun_tab.intes.t <- subset(abun_tab.intes.t, rowSums(abun_tab.intes.t) >= 3073)
abun_tab.intes.t.rare <- rrarefy(abun_tab.intes.t, 3073) # the NAs are not removed. 

abun_tab.intes.rare <- t(abun_tab.intes.t.rare)

# matrix to dataframe
abun_tab.intes.rare <- as.data.frame(abun_tab.intes.rare)

please let me know when provide the ggplot or complexHeatmap code for the outputs from functional alpha diversity

Many thanks again

gavinmdouglas commented 1 year ago

Hi @marwa38 ,

I added in a basic example of complexHeatmap here: https://github.com/gavinmdouglas/FuncDiv/wiki/Tutorial (although note that complexHeatmap is extremely customizable, detailed, and well-documented).

I'm not going to provide a detailed data analysis workflow, as essentially this would be a general R data analysis and ggplot2 tutorial (e.g., how to produce summary values from a data frame, how to plot data as boxplots, etc.).

marwa38 commented 1 year ago

thanks just checked my question is about code but I am really targeting to understand what really I need to check for the analysis of functional alpha diversity? as it is different from standard alpha diversity.

any ideas? I think I need to add_description for the function in picrust2 first.

gavinmdouglas commented 1 year ago

I think there are many possible ways to analyze these data. Essentially it is just like standard alpha diversity, but there are just many more dimensions (i.e., one for each function). So one could use ordination-based approaches (or other appropriate approaches for high-dimensional data) to compare the contributional alpha diversity across samples. You could also test whether there are specific functions that exhibit significant different contributional alpha diversity between sample groups of interest.

Adding a 'Description' column isn't necessary for using FuncDiv. Also, note that if you're analyzing PICRUSt2 data, that the easiest way to analyze it would be to use the predicted genome annotations output at the hsp.py stage with the input ASV abundance table (rather than using the table that is in contributional format output at the metagenome_pipeline.py step).

marwa38 commented 1 year ago

Many thanks for your comments.

Adding a 'Description' column isn't necessary for using FuncDiv. Also, note that if you're analyzing PICRUSt2 data, that the easiest way to analyze it would be to use the predicted genome annotations output at the hsp.py stage with the input ASV abundance table (rather than using the table that is in contributional format output at the metagenome_pipeline.py step).

The tables I used func_tab (transpose of KO_predicted.tsv) and abund_tab is the ASVs count table from phyloseq. As I found that the format provided at the FuncDiv tutorial is similar to the picrust2 format. In picrust2 I used one line command to run it. picrust2_pipeline.py -s ASVs.fa -i ASVs_counts.tsv -o picrust2_out_pipeline --stratified --remove_intermediate -p 19 Please advice if I am missing something? how I will define those functions?

gavinmdouglas commented 1 year ago

Those are the correct input files to use.

You can definitely add the description column, but you will need to remove it before running FuncDiv. You can also just use the mapping of function ids to descriptions in these files: https://github.com/picrust/picrust2/tree/master/picrust2/default_files/description_mapfiles

Gavin

marwa38 commented 1 year ago

You could also test whether there are specific functions that exhibit significant different contributional alpha diversity between sample groups of interest.

Thanks, Gavin @gavinmdouglas any suggested way to do that? I read your paper before I started using FuncDiv. I think it would be helpful if there are some examples for the downstream analysis of we are expecting to check that would be helpful for beginners. As you did kindly for the complexHeatmap example. Running functional diversity on boxplots to see which group where higher. Does this imply that the diversity was more linked to functions? but then any possibility to link functions to certain taxa?

gavinmdouglas commented 1 year ago

Hey @marwa38,

I wouldn't want to give the impression that there are standard workflows for analyzing this kind of data, especially as it depends on what hypothesis/hypotheses you're intending to test. If you're interested in whether there is a significant difference in contributional diversity between sample groups, then you could test for that using many statistical tests (e.g., Wilcoxon tests).

Cheers,

Gavin