atarashansky / SAMap

SAMap: Mapping single-cell RNA sequencing datasets from evolutionarily distant organisms.
MIT License
64 stars 19 forks source link

Extracting knn averaged, standardized expression values from samap object #68

Closed ivanferrreira closed 2 years ago

ivanferrreira commented 2 years ago

Can you show how you extracted the gene expression values and normalised them to the knn average, as shown in figure 4D of your paper?

Screenshot 2022-02-16 at 15 41 17

Are these the knn and/or xsim values? If so, I don't know how to match them to names stored in the dictionary homology_gene_names

Adding this to your tutorial would be useful for biologists with limited computational skills like me!

atarashansky commented 2 years ago

Hi there! knn-averaged and standardized means that the expression data was k-nearest-neighbors averaged and standardized to zero mean and unit variance.

knn-averaging is basically a simple way to "smooth" the data based on the expression or each cell's nearest neighbors.

You can achieve this with: sam.dispersion_ranking_NN(save_avgs=True) sam.adata.layers['X_knn_avg'] now contains the averaged expression data. Here, sam is one of the SAM objects corresponding to a particular species in a SAMAP object called (for example) sm (e.g. sm.sams['hu'])

To standardize the expression of a gene in this data (let's say "Gene A"), you can do: g = sam.adata[:,"Gene A"].layers['X_knn_avg'].A.flatten() gstd = (g-g.mean())/g.std()

Hopefully this helps! Please reopen the issue if you have any other questions.