ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
181 stars 55 forks source link

Calculate the mean relative abundance with two variables #294

Closed bfalco closed 3 months ago

bfalco commented 7 months ago

Hi Chi,

I'm interested in adding a plot_bar with the mean relative abundance per group and time. That is, adding both group and time to the groupmean parameter of trans_abund$, but I think it is not currently possible.

If I segment the files with the sample outside of R and calculate the mean per group with intervention times separately (pre/post) using microeco, the order of the taxa changes, and therefore, the color of the taxa is not the same at both times, making visual comparison difficult. For example, the genus Phocaeicola is in second place in pre, and in post, it is in third place.

Could you please tell me if there is any way to carry out the same process as with the samples by applying facet = c("Group", "Time") but calculating the mean with groupmean?

Thank you very much, Bruno

ChiLiubio commented 7 months ago

Hi, Bruno, How about merging group and time into one new column for the calculation? Actually, I cannot fully get your point, especially that in the second part(pre/post). Could you please explain them again?

Best, Chi

bfalco commented 7 months ago

I apologize if my explanation was not entirely satisfactory.

My intention is to calculate the mean relative abundance per group and time with the same microtable. Visually, I would like to see the bars with the mean relative abundance of pre and post, side by side, for each group in a single plot. Just like it can be done with individual samples from each subject by applying facet = c("Group", "Time").

It would be great to create this plot.

Best regards, Bruno

ChiLiubio commented 7 months ago

I think it may be feasible to first get a microtable object in which each group and time is treated as a 'sample' to satisfy the data transformation. To do so, you can use merge_samples function in microtable object, then try to merge generated object into one. Please attached your data if the issue is still there (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function). Thus I can try to run and show the whole steps.

Best, Chi

bfalco commented 7 months ago

I'm trying to manage the data to perform the operation you suggest, but I cannot find the solution.

I would greatly appreciate it if you could show me how to do it. I'm attaching my data (Bruno.zip).

Best regards, Bruno

ChiLiubio commented 7 months ago

Hi. Here are my steps.

load("Bruno.RData")
library(microeco)
library(magrittr)

group1 <- dataset$merge_samples(use_group = "Group")
time1 <- dataset$merge_samples(use_group = "Time")

tmp1 <- clone(group1)
tmp1$sample_table %<>% rbind(., time1$sample_table)
tmp1$otu_table %<>% cbind(., time1$otu_table)
tmp1$cal_abund()

t1 <- trans_abund$new(dataset = tmp1, taxrank = "Phylum", ntaxa = 10)
t1$plot_bar()
bfalco commented 7 months ago

Chi, it's not exactly what I wanted. When I say I would like to see the mean abundance per group and time, just like it can be done with individual samples from each subject by applying facet = c("Group", "Time"), I mean I want the same thing that appears in the image I'm attaching, but with the mean abundances calculated with groupmean. I don't know if it's possible to do it with microeco, but I would greatly appreciate the effort.

Plot

ChiLiubio commented 7 months ago

Hi. Is it this?

load("Bruno.RData")
library(microeco)
dataset$sample_table$per <- paste0(dataset$sample_table$Group, dataset$sample_table$Time)
d1 <- dataset$merge_samples(use_group = "per")
d1$sample_table <- dplyr::left_join(d1$sample_table, unique(dataset$sample_table), by = c("SampleID" = "per"))
rownames(d1$sample_table) <- d1$sample_table$SampleID
d1$cal_abund()

t1 <- trans_abund$new(dataset = d1, taxrank = "Phylum", ntaxa = 10)
t1$plot_bar(facet = c("Group", "Time"))
bfalco commented 7 months ago

Thank you very much, Chi, it was exactly what I wanted ;-)

Best regards, Bruno

bfalco commented 7 months ago

Chi, one more question about relative abundance.

With my data and the lines of code you have sent me, the percentages are not the same when calculating mean abundance per group and time as when calculating abundance per sample/subject adding dataset = dataset in trans_abund$new, and this can be observed in the graphs. For example, I'm attaching the codes and the graphs with the abundance of the genus for you to check how the order and color of the taxa change:

load("Bruno.RData")
library(microeco)
dataset$sample_table$per <- paste0(dataset$sample_table$Group, dataset$sample_table$Time)
d1 <- dataset$merge_samples(use_group = "per")
d1$sample_table <- dplyr::left_join(d1$sample_table, unique(dataset$sample_table), by = c("SampleID" = "per"))
rownames(d1$sample_table) <- d1$sample_table$SampleID
d1$cal_abund()
t1 <- trans_abund$new(dataset = d1, taxrank = "Genus", ntaxa = 10)
t1$plot_bar(facet = c("Group", "Time"), xtext_keep = FALSE)
t2 <- trans_abund$new(dataset = dataset, taxrank = "Genus", ntaxa = 10)
t2$plot_bar(facet = c("Group", "Time"), xtext_keep = FALSE)

Plot2

Why is the exact same mean not obtained when calculating relative abundance to generate both graphs?

ChiLiubio commented 7 months ago

Two ways of mean relative abundance have a little difference. They are not exact same. You can try to subset a toy example data (Oscil..., Chiris... genus) to manually calcualte it.