FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
107 stars 29 forks source link

Error with ANCOMBC2 code for dependant samples #230

Closed Lobna-H closed 1 month ago

Lobna-H commented 11 months ago

I am trying to perform differential abundance for microbial community composition by ANCOMBC2 using R. I have 3 samples from 4 dependent groups (group 1, group 2, group 3, group 4) each group contains 3 samples. group1 is the beginning point (timepoint0), group 2 is group 1 after treatment 1 for a period of time (timepoint1), group 3 is group 2 after treatment 2 for a period of time (timepoint3), group 4 is group 3 after treatment 3 for a period of time (timepoint4). group 1 is the reference group.

Here my script

output <- ancombc2(data = pseq5, assay_name = "counts", tax_level = "Phylum", fix_formula = " groups + Timepoint", rand_formula = "(Timepoint | subject)", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "groups", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 3, verbose = TRUE, global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE, iter_control = list(tol = 1e-2, max_iter = 20, verbose = TRUE), em_control = list(tol = 1e-5, max_iter = 100), lme_control = lme4::lmerControl(), mdfdr_control = list(fwer_ctrl_method = "holm", B = 100), trend_control = list(contrast = list(matrix(c(1, 0, 0, -1, 1, 0, 0, -1, 1), nrow = 3, byrow = TRUE)), node = list(3), solver = "ECOS", B = 100))

I got the error

Obtaining initial estimates ... Error: Estimation failed for the following covariates: Timepointt1, Timepointt2, Timepointt3 Please ensure that these covariates do not have missing values and check for multicollinearity before re-estimating the model In addition: Warning message: Small sample size detected for the following group(s): group1, group2, group3, group4 Variance estimation would be unstable when the sample size is < 5 per group

Could you please help solve this error

agrier-wcm commented 11 months ago

group1 is the beginning point (timepoint0)

If groups and Timepoint are equivalent variables, you'll only want to include one of them in the fix_formula argument. Whichever one you do include, you should also pass that one as the group argument.

You may still have an issue with sample size, but that should fix your current error message.

Lobna-H commented 10 months ago

Thanks for your help! The new script worked. However, the output contained a lot of NA results. here is the new script and the resulting output

output <- ancombc2(data = pseq5, assay_name = "counts", tax_level = "Phylum", fix_formula = "groups", rand_formula = "(Timepoint | groups)", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "groups", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 3, verbose = TRUE, global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE, iter_control = list(tol = 1e-2, max_iter = 20, verbose = TRUE), em_control = list(tol = 1e-5, max_iter = 100), lme_control = lme4::lmerControl(), mdfdr_control = list(fwer_ctrl_method = "holm", B = 100), trend_control = list(contrast = list(matrix(c(1, 0, 0, -1, 1, 0, 0, -1, 1), nrow = 3, byrow = TRUE)), node = list(3), solver = "ECOS", B = 100))

and res output is the attached file

res_output.csv

agrier-wcm commented 10 months ago

rand_formula = "(Timepoint | groups)"

Again, if Timepoint and groups are equivalent variables, using both of them, anywhere, will cause an issue. Perhaps you mean this:

rand_formula = "(groups | subject)"

You may still have an issue with sample sizes. If you change the rand_formula to the above, do you still get all NAs? Do you get warnings?

Lobna-H commented 10 months ago

Thanks for your help! when I tried what you recommended, I got an error Here is the script I used

output <- ancombc2(data = pseq5, assay_name = "counts", tax_level = "Phylum", fix_formula = "groups", rand_formula = "(groups | subject)", p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "groups", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 3, verbose = TRUE, global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE, iter_control = list(tol = 1e-2, max_iter = 20, verbose = TRUE), em_control = list(tol = 1e-5, max_iter = 100), lme_control = lme4::lmerControl(), mdfdr_control = list(fwer_ctrl_method = "holm", B = 100), trend_control = list(contrast = list(matrix(c(1, 0, 0, -1, 1, 0, 0, -1, 1), nrow = 3, byrow = TRUE)), node = list(3), solver = "ECOS", B = 100))

I got this error

REML iteration = 1, epsilon = 0 Estimating sample-specific biases ... Error in { : task 1 failed - "values must be length 1, but FUN(X[[2]]) result is length 0" In addition: Warning message: Small sample size detected for the following group(s): group1, group2, group3, group4 Variance estimation would be unstable when the sample size is < 5 per group

agrier-wcm commented 10 months ago

Now I wonder if the error is because the groups variable is a factor (which it needs to be for the fix_formula = and groups = arguments), but rand_formula needs a numeric to do random slopes per subject. Can you code Timepoint as a numeric (i.e. replace timepoint0, timepoint1, timepoint2, and timepoint3 with the numerical values 0, 1, 2, & 3, and then don't recode it as a factor) and try again using rand_formula = "(Timepoint | subject)" (everything else the same)?

Lobna-H commented 8 months ago

Thank you so much, after changing the time in the metadata file to numeric as you suggested the script worked fine and the output was as expected. :)