Using different latent space and knn for milo analysis

revolvefire commented 1 month ago

Thanks for the wonderful tool!

I was wondering whether it would be okay to perform MILO analysis based on a different latent space and KNN than those used for building the initial UMAP and clusters.

For example, using X_scVI for the analysis, and then using PCA and a new KNN for MILO analysis.

When tested on a couple of samples, it seems that X_scVI-based MILO may be slighly so too conservative when used on identical sample types with two different conditions, especially when the sample size is only 20~30K, whereas batch-corrected weighted PC-based KNN seems to work slightly better (it does not compress the data too much in terms of spatial FDR).

On one hand, I feel like this approach is acceptable, but on the other hand, I have some doubts.

emdann commented 3 weeks ago

Hi @revolvefire, this is entirely up to the user and to what you think makes most sense for the dataset you are analysing. I personally find that using the same KNN graph for multiple analyses helps for coherent interpretation.

The best way to safeguard your analysis from false positives coming from imperfect dimensionality reduction is to include confounders (i.e. labels that you use for batch correction) in the DA testing model, either as samples (columns of the count matrix) or as additional factors in the linear model.

revolvefire commented 2 weeks ago

Thank you for the detailed reply!

emdann / milopy

Using different latent space and knn for milo analysis #48