What to do if no controls are there?

emdann commented 2 years ago

Use case: human dataset, I have my disease samples, but no matched controls from my own study. The atlas contains n samples. Can I choose a group of these as my control samples?

Using dimensionality reduction on the samples: make a latent space Z of dimensions s x d, where s is the number of samples and d is the number of latent dimensions, to match each sample to the most similar samples in the atlas for comparison.
- with PCA on nhood counts
- with scVI on nhood counts: can we correct for technical effects?
How do we know when samples are significantly different? Diverging from the atlas background (distribution considering all the cells together). This should give an indication on when matched controls are needed: if the single samples do not agree with general distribution.

emdann commented 2 years ago

How bad of a control is still better than no control? We need a measure of similarity between samples

emdann commented 2 years ago

[ ] Normalization of neighbourhood counts: log-counts or TMM
[ ] Dimensionality reduction on neighbourhood counts: PCA (we want to conserve batch effects in this case)
[ ] Distance metric: euclidean on PCA space, gauss kernel euclidean, EMD ?
[ ] Comparison against similarity at the cell type level?
[ ] Measure of deviation from full atlas?

emdann commented 2 years ago

Compare

No control (negative control): PA design
Random subsampling: select at random the same number of donors
Unsupervised good selection: select n donors closest to dataset in PCA space
Unsupervised bad selection: select n donors most distant to dataset in PCA space
Supervised selection: select based on demographics/covariates
Same dataset (positive control)

emdann commented 2 years ago

See https://github.com/emdann/diff2atlas/blob/master/notebooks/PBMC_sample_similarity.ipynb

MarioniLab / oor_design_reproducibility

What to do if no controls are there? #6