atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
64 stars 19 forks source link

Question about neigh_from_keys parameter #126

Open Bassi-git opened 1 year ago

Bassi-git commented 1 year ago

Hi, Thank you for this great package, it works really well! I have a question regarding the algorithm to make sure we are getting the best result.

  1. What exactly is the difference between the keys parameter in SAMAP() and setting neigh_from_keys in sm.run()? As far as I understand, keys in SAMAP limits neighbourhood sizes, but cells with the same key label could still end up being in different nieghbourhoods, if you don't set the neigh_from_keys parameter. Is this correct?

  2. If you use neigh_from_keys in sm.run(), does this make setting keys in SAMAP() unnecessary?

  3. I think I understand why one would want to set keys in SAMAP(), e.g. as you say in the tutorial to not merge neighbourhoods from rare celltypes, but when would you recommend/ why would one decide to set the neigh_from_keys parameter? I have noticed some differences depending on whether I set the neighbourhoods or let SAMap determine them. Now I wonder if there are any guidelines for what works better for different kinds of datasets.

Looking forward to your answer! Thank you!

atarashansky commented 11 months ago

keys determines which set of labels to use to set the maximum neighborhood size for a particular cell. neigh_keys determines which set of labels to use to determine the neighborhood for a particular cell.

When neigh_keys is unspecified, the default behavior is for each cell to "hop" 3 nodes out across every possible path and include all visited cells into its neighborhood. The number of cells in the neighborhood do not exceed the number of cells in its cluster specified by keys.

When neigh_keys is specified, each cell's neighborhood will be all other cells that share the same label.

Use neigh_keys when you have a confident set of annotations and wish to use them to exactly specify each cell's neighborhood.