MarioniLab / oor_design_reproducibility

14 stars 1 forks source link

Should the atlas contain disease data? #9

Closed emdann closed 2 years ago

emdann commented 2 years ago

There is a community-wide debate on whether the disease cell states should be included in de novo integration or not, to better identify disease-specific cases.

My intuition is that the more data, the less informative about fine subpopulations the features selected (or with high weights) become. Also this option is suboptimal since it doesn't really allow to go reference-free.

Experiment: compare PAC design (i.e. mapping to healthy atlas) to comparison of P and C data trained on a single model with the atlas (maybe on the COVID dataset directly). If at least as good as PAC with reference mapping, then we can make the argument that a healthy/incomplete reference atlas is good enough for the purpose of comparative analysis between conditions.

emdann commented 2 years ago

Results in COVID analysis https://github.com/emdann/diff2atlas/blob/master/notebooks/COVID19/20220713_COVID_design_comparison.ipynb