FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
103 stars 26 forks source link

Analize an interaction #176

Closed Anaherasm closed 1 year ago

Anaherasm commented 1 year ago

Hello, I am writing to you because I have a doubt. I would like to know wether is it possible to analyze an interaction using ANCOM. This is my situation: I have piglets with different dietary treatment, I would like to know if it affected differently in small and large piglets. If an interaction is not possible to determine, is there any other possibility?

Yours faithfully, Ana

FrederickHuangLin commented 1 year ago

Hi Ana,

Unfortunately, ancom does not support the inclusion of interaction terms. The inclusion of interaction terms in the ancombc or ancombc2 is also tricky.

See Question 7 in the Q&A section for more details.

Best, Huang

tybonic commented 4 months ago

Hi Huang,

I am also specifically interested in testing an interaction term. I have read the Q&A section. However, I am not sure if I understand how to "manually create the interaction term of interest outside of the formula" correctly.

My design has two fixed factors:

I want to test for which microbial taxa the response to "treatment" differs between "old" and "new". Thus, I want to know for which taxa the interaction term including "treatment" and "age" is significant.

For this, I created a new factor "treatment_age" with levels "control_old", "control_new", "treated_old" and "treated_new" and included it as a fixed factor in the ancombc() call. Is that what you meant by "manually create the interaction term"? This is my call:

ancombc2(data = phyloseq_object,
         tax_level = NULL,
         fix_formula = "treatment+age+treatment_age",
         verbose = TRUE)

Unfortunately, it does not execute but returns an error:

Obtaining initial estimates ...

Error: Estimation failed for the following covariates:
treated_old, treated_new
Please ensure that these covariates do not have missing values and check for multicollinearity before re-estimating the model

Can you please check if my approach is valid or help me figure out how to perform the desired analysis with ancombc2? Many thanks in advance! Let me know if you need additional information!

Session info: R version 4.3.2 (2023-10-31) Platform: x86_64-conda-linux-gnu (64-bit) Running under: Ubuntu 20.04.6 LTS

Version of ANCOMBC: 2.4.0

FrederickHuangLin commented 4 months ago

Hi @tybonic,

I saw your email and checked your post on StackOverflow. The issue arises because once you create the interaction term, you should not include the main effects in the model. Including them can introduce multicollinearity, leading to unstable point estimates, as mentioned in the error message. In your example with age and treatment, after manually creating the interaction term, you effectively have four groups: new_control, old_control, new_treated, and old_treated. Your hypothesis testing should then focus on the differential abundance between these specific groups.

Best regards,

tybonic commented 4 months ago

Hi Huang, many thanks for your quick response! Please apologize if my multiple attempts to contact you appeared pushy. I now understand that it is not possible to include the main effects and the manually created interaction term, and I will pursue my analysis as suggested by you. If I want to compare the log fold change from control to treated between old and new, can I run ancombc2() on two subsets of my data, each containing the samples of the "old" and "new" levels? Best regards!

FrederickHuangLin commented 3 months ago

When you said you want to obtain “log fold change from control to treated between old and new”, what exactly is the quantity you are looking for? Since treated/control and old/new are 4 different combinations here.

tybonic commented 3 months ago

Both age levels ("old" and "new") are subjected to both treatments ("control" and "treated"). I want to know for which microbial taxa the log fold change in response to treatment differs between age levels, e.g., if the response to the treatment is stronger for "new" than for "old".

When I run the analysis with the manually created interaction term, the combined levels for this term are ("control_old", "control_new", "treated_old" and "treated_new"). The log fold change is then expressed in relation to one reference level (e.g., "control_old").

Both treatment and age have an effect on microbial relative abundance. Thus, the log fold change is smaller for the comparison of "control_old" and "treated_old" than for "control_old" and "treated_new".

In my understanding, I would have to compare the log fold change between "control_old" and "treated_old" with that of "control_new" and "treated_new" to find out if the response to treatment differs between "old" and "new".

My question above refers to this. Can I run ancombc2() on subsets containing only samples of the "old" and "new" levels, testing for differences between "control" and "treated" and compare the log fold change between these analysis?

In this case, I would run ancombc2() as follows and compare the log fold change between ancombc2_old and ancombc2_new:

library(phyloseq)
library(dplyr)
library(ANCOMBC)

phyloseq_object_old <- phyloseq_object %>%
  subset_samples(age == "old")

ancombc2_old <- ancombc2(data = phyloseq_object_old,
                         fix_formula = "treatment")

phyloseq_object_new <- phyloseq_object %>%
  subset_samples(age == "new")

ancombc2_new <- ancombc2(data = phyloseq_object_new,
                         fix_formula = "treatment")
FrederickHuangLin commented 3 months ago

Hi @tybonic,

The stratified analysis approach you've proposed sounds reasonable.

Additionally, to make fuller use of the samples, you might consider running ancombc2 on the entire dataset with a manually created interaction term. This method would allow comparisons such as:

To visually represent these effects, you could create a Venn diagram displaying the sets of differentially abundant taxa, which would provide a qualitative understanding of the treatment*age interaction. Furthermore, comparing the log fold changes could offer additional insights.

However, this approach isn't ideal for quantitatively deriving the effects of the treatment*age interaction, such as test statistics and p-values. Currently, I'm limited in resources to enhance the functionality to include interactions effectively. Later this year, I expect to have some computing support, and I hope we can formally address the interaction issue then.

tybonic commented 3 months ago

Hi Huang, Thank you very much for your detailed feedback! I think I can get the information I am interested in with this approach. Looking forward to the implementation of interactions in a future version!