What to do with unmatched control samples

nickhir commented 1 year ago

Hello,

First of all, thank you very much for creating such a great tool and providing detailed tutorials!

Very similar to your in-depth tutorial I am comparing stimulated vs unstimulated (ctrl) samples. However, my situation is the following: We have 8 control single cell experiments (replicates) but only 2 for the stimulated condition, but these 2 stimulated samples have matched controls. Is it possible to use all 8 of my controls, or should I "discard" the 6 controls that do not have a "matched" stimulated condition?

And if I can use all 8 controls, how should I then normalize the density estimates across samples?

Another question I had was the following: In the in-depth tutorial you are performing the parameter search to identify the optimal MELD parameters. Can I still use your code to determine optimal parameters for my dataset, if I intend to use UMAP instead of PHATE? I am asking because you have this specific line ( benchmarker.fit_phate(data_pca)) in the script and there is no fit_umap equivalent.

Any insights are much appreciated!

Thank you very much in advance!

barbareyex commented 1 year ago

Hi @nickhir,

I am facing the same problem. We have several samples/subjects, but in our case, none of them are paired replicates. One approximation could be doing an average of densities to create a pseudosample for treatment i.e., and another for control samples. Then you could use these pseudosamples' densities to estimate the sample likelihood.

If you want to use UMAP for visualization you can take the UMAP coordinates from adata.obsm['X_umap'], but in the case you mention (benchmarker.fit_phate) is not possible I think.

If someone has any suggestion for this problem, just tell us!

Thanks!

dburkhardt commented 1 year ago

Yeah I think this is the right way to think about it. I would always estimate sample densities independently, regardless of the comparison you want to do.

Having 8 treatment replicates will help you estimate the variation in density of the treatment condition, but to calculate the relative likelihood, you want to average the treatment and control densities first if you don't have paired information.

Regarding the parameter estimation, I haven't tried it, but you should be able to use benchmark.set_phate(adata.obsm['X_umap'] or something like that.

The code in that class is not very complex: https://github.com/KrishnaswamyLab/MELD/blob/main/meld/benchmark.py

KrishnaswamyLab / MELD

What to do with unmatched control samples #61