Open mohebg opened 1 year ago
Hi @mohebg, thanks for your interest in tascCODA!
The "formula" parameter determines, like in R's lm
function, which covariates are considered for modeling. Currently, tascCODA performs model selection for all covariates in the formula, meaning that we look whether effects are significant for all covariate/tree node pairs. It's not possible at the moment to just adjust for a covariate without running model selection on it, although this might be possible in a future update.
Regarding the other arguments, you can ignore the reg
parameter. This is only needed for switching between earlier versions of the tree-aggregated penalization scheme. The one described in the paper is "reg_3", which is also the default.
With the pen_args
parameter, you can set the phi
(aggregation bias) and lambda_1
(regularization strength) values, like they are described in the paper.
I hope that this answers your questions!
Hi @johannesostner ,
Thanks alot for your prompt reply. According to my understanding, the best practice for adjusting for a covariate (or the statistical elimination of a covariate) is to simply add the covariate to the linear model. As you have stated the formula is an R style, so in order regress out age and sex, shall the formula be written like this: formula="PATH+age+sex".
So, would making the formula "PATH+age+sex" regress out age/sex in the case vs control comparison?
Thanks alot
Yes, just add the covariate to the model. That's what I would do as well. As I said earlier, this does not "regress out" age/sex, but tascCODA will try to find significant impacts of age/sex and adjust for them accordingly. If age/sex don't have a significant impact on the composition, they also won't be adjusted for. In that regard, it's not a standard adjustment for the covariates.
Also, please make sure that all covariates are scaled to the same range (i.e. [0-1]), as the selection of significant associations will otherwise be biased
@johannesostner , thanks alot for your reply, I appreciate. I am not sure if I fully understand the sentence "covariates are scaled to the same range (i.e. [0-1])".
I have there levels of covariants:
Just make sure that age is also scaled to a range between 0 and 1 (i.e. via min-max scaling like we did in the microbiome application of our paper). Otherwise the effects for age (since its range is so much bigger than for the categorical covariates, which will be encoded as 0/1) will be very small numerically and thus never selected to be significantly different from 0.
Hi, Good day, thank you for the nice package.
I have some questions on how to use tascCODA to regress covariants as age and sex in addressing the compositional changes between case and control in scRNAseq.
In your paper you state: "More generally, however, tascCODA enables to determine how host phenotype, such as disease status, host covariates such as age, gender, or an individual’s demographics, or environmental factors jointly influence the compositional counts"
Shall the formula be written like this:
tree_mod= ana.CompositionalAnalysisTree( datax.copy(), reference_cell_type="automatic", formula="PATH+age+sex", reg="scaled_3", pen_args={"phi": 0, "lambda_1": 1.7} )
I also wanted to ask what the following arguments mean? "reg="scaled_3" pen_args={"phi": 0, "lambda_1": 1.7}
Thank you very much in advance.
Best Moheb