ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
181 stars 55 forks source link

Identification of ASVs biomarkers and longitudinal microbiome analysis #323

Open guianrey opened 4 months ago

guianrey commented 4 months ago

I would like to get your opinion on a topic. I am new to differential microbiome analysis and recently conducted a study where I examined the microbiome of animals in three different health states over four time periods. To explore the differences between the health groups, I used LEfSe, a widely used tool in this field. However, I found that the graphs generated by LEfSe showed the relevant microbial genera for each group, but did not provide information on which amplicon sequence variants (ASVs) corresponded to each genus. Could you help me get information.

Additionally. LeEfSe, performs a study between groups or conditions. This led me to wonder if the approach I am using is appropriate for my data, since LEfSe does not take into account the temporal structure of the data and treats all groups equally. Do you think it would be more appropriate to use MetaLonDA for a more complete and specific analysis of longitudinal microbiome data? I would very much appreciate your opinion on this.

Best

Guillermo

ChiLiubio commented 4 months ago

Hi. For the first issue, I think it is feasible to add ASV as a taxon unit. Thus the input data for lefse will also have each ASV, and the final result can also have the ASV information. For the second, I am not sure whether four time points are enough for the input for MetaLonDA. So you can have a try. Of course, you can also use some other methods, e.g. linda and maaslin2. Those methods support multiple factors input, in which time can be a factor.

Best, Chi

guianrey commented 4 months ago

thanks Chi for your prompt reply

I was analyzing my data with LinDA, but it gives me some strange results. Could you share me a script from maaslin2 and linda to analyze my data in microeco, I have 3 levels of disease condition (healthy, disease 1 and disease 2), and 4 levels of weight (2, 8, 11 and 18 g).

t1 <- trans_diff$new(dataset = meco_dataset, method = "linda", group = "Condition+Weigh", taxa_level = "Genus", filter_thres = 0.001) View(t1$res_diff)

image

the result was the comparison between pairs of conditions, and the comparison between weights, the logical thing would not be to compare the weights of each condition 1 with condition 2 and 3 successively? which could be plotted on a heat map with the following t1$plot_diff_bar(heatmap_x = "Comparison", heatmap_y = "Taxa")

but the map looks like this

image

I would appreciate your support

Best

Guille

ChiLiubio commented 4 months ago

Hi. Guille. You are right! The results of Linda and maaslin2 are both paired comparisons. We can not change this format as those are the design of methods. The main reason is for each factor of experiment, there is generally a reference, i.e. the control. But we can set different reference. For linda, it is controlled by the R factor like the following example shows.

library(microeco)
library(magrittr)

data(dataset)
t1 <- trans_diff$new(dataset = dataset, method = "linda", group = "Group+Type", taxa_level = "Genus")
unique(t1$res_diff$Comparison)

dataset$sample_table$Group %<>% factor(., levels = c("TW", "CW", "IW"))
t2 <- trans_diff$new(dataset = dataset, method = "linda", group = "Group+Type", taxa_level = "Genus")
unique(t2$res_diff$Comparison)

The following is a maaslin2 example. The reference parameter (parameter passing) from Maaslin2 function is used to control the reference group.

library(Maaslin2)
input_data <- as.data.frame(t(read.delim(system.file('extdata','HMP2_taxonomy.tsv', package="Maaslin2"), row.names = 1)))
input_metadata <- read.delim(system.file('extdata','HMP2_metadata.tsv', package="Maaslin2"), row.names = 1)
library(microeco)
d1 <- microtable$new(input_data, sample_table = input_metadata)
d1$taxa_abund$Species <- d1$otu_table
t1 <- trans_diff$new(dataset = d1, method = "maaslin2", 
        fixed_effects = c('diagnosis', 'dysbiosisnonIBD','dysbiosisUC','dysbiosisCD', 'antibiotics', 'age'),
        random_effects = c('site', 'subject'),
        normalization = 'NONE',
        reference = 'diagnosis,nonIBD',
        standardize = FALSE,
        taxa_level = "Species", filter_thres = 0.001)
View(t1$res_diff)
t1$plot_diff_bar(heatmap_cell = "coef", heatmap_lab_fill = "Coef", heatmap_x = "Env", heatmap_y = "Taxa", keep_prefix = TRUE)