Closed matthiaswietz closed 2 years ago
Hi Matthias,
Thanks for your interest in ANCOMBC.
I guess the season
variable is a categorical variable with four levels: spring
, summer
, fall
, and winter
, right? In R, the categorical variable will be ordered alphabetically by default, so the levels are: fall
< spring
< summer
< winter
, which means fall
is the reference level.
Therefore, for the outputs you saw, seasonspring
means spring - fall
, seasonsummer
mean summer - fall
, etc. You won't see seasonfall
as it is essentially 0.
res$diff_abn
will give you a hint whether a taxon is differentially abundant (TRUE
) or not (FALSE
). To see whether a taxon abundance is increasing or decreasing, you can go to res$beta
. For example, seasonspring = 0.5
means the average abundance in spring
is exp(0.5)
times more than that of fall
.
Best, Frederick
Thanks Frederick for the detailed explanation! Maybe I'm confused now, but how would I convert the output to identify the taxa enriched in fall, if this is "only" the reference level? Eventually, I was hoping to pinpoint the taxa significantly enriched per season, and not only in comparison to fall. Many thanks again!
Hi @matthiaswietz,
I understand your concern, but unfortunately, all current differential abundance methods can only handle the difference, not the individual effect size. Here is an illustrative example, consider a simple linear regression with a categorical variable sex
as the only covariate
# Load the data
data("Salaries", package = "car")
# Linear regression model
model <- lm(salary ~ sex, data = Salaries)
summary(model)$coef
The output looks like
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 101002 4809 21.00 2.68e-66
## sexMale 14088 5065 2.78 5.67e-03
where 101002
is the average salary among females, 101002 + 14088
is the average salary among males, and 14088
is the average difference in salary between males and females. ANCOM-BC is able to estimate the difference (14088
in this example unbiasedly), but not able to estimate the intercept term (101002
in this example, or the effect size of fall
in your case) unbiasedly like all other methods.
Hope it makes sense, Frederick
Hi @FrederickHuangLin , I was looking for info in a related matter and I found this issue. I understand what you say, which is more or less what's in the tutorial, but what it threw me out was the Figure 6a in the ANCOM-BC paper as it shows all possible comparisons between the 3 countries, something that I'm interested, and pretty sure that is closer to what @matthiaswietz was enquiring about. How did you run that? This is also referred to in Supplementary Tables 2 and 3 for example where you refer to them as "Pairwise tests using ANCOM-BC". Is there a hidden pairwise test option or is it required something else to get there? Thank you in advance.
Hi @xvazquezc,
The figure 6a you referred to was generated by applying two-group comparison to different subsets of data, i.e., the subgroup of "MA" and "US", the subgroup of "VEN" and "US", and the subgroup of "MA" and "VEN", respectively.
We understand that this is not the perfect and statistically rigorous way to handle pairwise comparisons, and we have a prototype extension to ANCOM-BC but the updated algorithm has not been fully tested. We put all resources at this moment to develop a correlation methodology and it is about to wrap up. The formal ANCOM-BC extension will come next. Stay tuned!
Best, Frederick
Hi @FrederickHuangLin, thanks for sharing this tool. If I may just ask a question in this thread since it's somewhat related. I was wondering if there's a way to re-level the levels of a variable i.e. manually setting the reference level besides the default alphabetical order. Thanks a lot!
Update: One way I can re-level is to change the names of the covariate levels (in the metadata file before generating the phyloseq object) in such a way the first level in the alphabetical arrangement is the reference group.
Hi @papelypluma,
Yes, you can definitely relevel the groups in R. For example, you can refer to this post.
Best, Frederick
Hi @papelypluma,
Yes, you can definitely relevel the groups in R. For example, you can refer to this post.
Best, Frederick
I think the problem is that the formula argument only accepts strings instead of factor levels/designs as input (which is commonly used in similar tools), which makes it incredibly difficult to change reference using the relevel()
command.
It would be great if we can input something like ancombc(.... , formula = ~condition+batch,..)
instead of ancombc(.... , formula = "condition+batch",..)
of course we can do something like
condition = str_replace_all(condition,pattern = "reference_of_interest",replacement = "A")
as @papelypluma said
Dear Frederick,
first of all many thanks for this package! Just tested to detect taxa enriched by season using
bac <- pseq %>% tax_glom("Genus") %>% filter_taxa(.,function(x) mean(x) > 10, T)
out = ancombc(bac, formula = "season", p_adj_method = "holm", zero_cut = 0.90, lib_cut = 1000, group = "season", struc_zero = T, neg_lb = F, tol = 1e-5, max_iter = 100, conserve = T, alpha = 0.05, global = T)
I do get an output, which is however confusing: there is "seasonspring", "seasonsummer" and "seasonwinter" but no "seasonfall".
Also, I'd interpret that TRUE in
res$diff_abn
means that a given taxon is enriched under the specific condition (for example, if TRUE under seasonspring it's enriched in spring). But the results do not match other DA tools I've been testing.Many thanks for your help! Matthias