problem in intetpreting the SAMap integration results

GGboy-Zzz commented 2 months ago

Hello, Thank you for developing such a useful tool! I'm working on integrating scRNAseq data cross species, and with the samap tools, I got an integration result that looks pretty good. To interpret the samap results, I have some confusion that hoping to get your hlep. My stitched samap umap as below, My problem is,

I had passed known cell annotation to keys and neigh_from_keys in samap run, and I want to know if it is necessary to pass two parameters at the same time, because I only passed the cell annotation to neigh_from_keys before. In addition, do you think using leidn clustering would improve the integration result?
for some cell types, It's not a complete one-to-one correspondence (based on cell annotation resolutions). And I want to identify the specific cell barcode that mapping or unmapping to a certain cell type of another species, such as cell label transfer, how can I achieve it?

Thank you in anticipation

Best regards

atarashansky commented 2 months ago

neigh_from_keys actually expects a dictionary of booleans keyed by species ID - sorry the documentation isn't clear. Species where neigh_from_keys is True use the values defined in keys to determine neighborhoods. By default, keys uses leiden clustering. So if you'd like to use custom annotations the right way is to set neigh_from_keys to True and set keys to the annotation column name for each species. (Incidentally, setting neigh_from_keys to a dictionary of strings ends up being truthy anyway, so you probably don't need to rerun samap.)
If you're comfortable working with sparse adjacency matrices, you can always look at the graph in sm.samap.adata.obsp['connectivities'] and for each row (cell) see which other cells it is connected to (nonzero columns).

GGboy-Zzz commented 2 months ago

Thanks for your clear response, I set both keysand neigh_from_keys to my annotation column, code as below, names={'mo':ENSMUST_array,'ze':ENSDART_array} sm = SAMAP(filenames,f_maps = './maps/',save_processed=False, names=names,keys ={'mo':'celltype.predicted','ze':'ClusterName_short'}) sm.run(neigh_from_keys={'mo':'celltype.predicted','ze':'ClusterName_short'}) samap = sm.samap And I wanted to identify aligned cell types by caculating cell type mapping scores, most of the cell types connected as expected with high mapping scores. However, a small portion of cell types showed either low mapping scores or incorrect connections, which I suspect may be due to inconsistencies in the granularity of cell annotations. I would like to inquire about the following:

What is the threshold for a reliable mapping score? it's robust in the quantity of a certain cell type?
After rerunning SAMap on a subset of cell types ( not a one-to-one correspondence), I noticed that the cells from the species with fewer cells were more scattered on the UMAP. Could this be due to over-integration? custome cluster annotation leiden_cluster

atarashansky / SAMap

problem in intetpreting the SAMap integration results #152