atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
63 stars 19 forks source link

Combined umap for 1 cell lineage, retrieving latent representation #134

Open nhgopee opened 8 months ago

nhgopee commented 8 months ago

Thank you for this very useful package. I am comparing mouse and human and have a few questions regarding the combined umap, retrieving the latent representation and optimising memory usage for GenePairFinder.

  1. When I do the analysis across all cell lineages, there is very good overlap of the main cell lineages in the combined umap between human and mouse skin. However, when subsetting to specific cell lineages, such as fibroblast, the combined umap is very fragmented, with 1 cell type forming several split clusters and the overlap between the 2 species is less informative. I have tried to test different iterations, varying: resolutions and defining keys in SAMAP as well as neighbours from keys in sm.run but this has not made much difference. Is there any other way I could address this to make the clusters in the umap restricted to predefined cell types?

  2. Is there a way to retrieve the latent representation of the combined umap. Samap.adata currently returns X_umap only. I am wondering about including the pca/wpca in the adata if possible?

  3. I am very keen to use GenePairFinder but have hit memory limit very quickly when running this (up to 200GB) even on the smaller data subset.