egr95 / R-codacore

An R package for learning log-ratio biomarkers from high-throughput sequencing data.
Other
21 stars 3 forks source link

Covariate adjustment ala selbal? #6

Closed johannesbjork closed 1 year ago

egr95 commented 3 years ago

Thanks Johannes!

I have just added some extra content in the guide that should help clarify a couple of the ways in which covariate adjustment can be carried out with Codacore. Let me know if any parts of it are unclear.

egr95 commented 3 years ago

Let me also note here that the covariate adjustment in Codacore is slightly different to Selbal, and this is by design. In Selbal, we learn a single balance that is found by fitting many candidate balances, where the candidates are proposed by stepwise search. The covariates are fitted jointly with each and every candidate balance, which means the effect of the covariate on the response could vary significantly in different runs. In the current implementation of Codacore, we do not jointly optimize over the regression coefficients of the candidate balance and the space of all possible balances. We propose instead to "partial out" the covariates a priori by regressing the response on the covariates, and then fitting Codacore on the residual of this fit. Such a strategy feels more natural in the context of Codacore's "ensembling" procedure, where multiple predictive log-ratios can be added up to produce a better prediction. Nevertheless, implementing a joint optimization "a la Selbal" should involve some relatively straightforward changes to our gradient descent code in the Codacore optimizer, so if this is something of general interest I would be happy to prioritize adding such a feature in a future version.