carmonalab / ProjecTILs

Interpretation of cell states using reference single-cell maps
GNU General Public License v3.0
234 stars 27 forks source link

Ensembl transcript IDs #21

Closed pbschwar closed 2 years ago

pbschwar commented 2 years ago

Hello,

Great package!

I was hoping that you would be able to provide a separate reference map with ensembl transcript IDs instead of gene ids. It does not seem like Seurat supports using feature level metadata that would make converting on our end possible (when we start our analysis with ensembl transcript IDs).

Thanks!

Patrick

mass-a commented 2 years ago

Hello Patrick,

normally we would translate Ensembl IDs to gene names for the query object, to be able to map the genes to one of the reference atlases. Some piece of code to achieve that may be:

dataUrl <- "https://drive.switch.ch/index.php/s/iJKbWGHwOhY1Llu/download"
fname <- "mart_conversion_Mm.txt"

download.file(dataUrl, fname)

table <- read.csv(fname, sep = "\t")

ID2name <- table$Gene.name
names(ID2name) <- table$Gene.stable.ID
ID2name <- ID2name[!duplicated(names(ID2name))]

# Convert rownames in gene matrices
exp_mat <- exp_mat[rownames(exp_mat) %in% names(ID2name), ]
rownames(exp_mat) <- ID2name[rownames(exp_mat)]

where the expression matrix can come from your Seurat object.

We could think of doing this conversion between gene names within the ProjecTILs package. I will add this to the to-do list :)

Cheers, -massimo

pbschwar commented 2 years ago

Thanks so much!

Patrick