czbiohub-sf / tabula-muris

Code and annotations for the Tabula Muris single-cell transcriptomic dataset.
https://www.nature.com/articles/s41586-018-0590-4
BSD 3-Clause "New" or "Revised" License
187 stars 91 forks source link

Mapping gene names to ENSEMBL IDs #242

Closed anamariaelek closed 2 years ago

anamariaelek commented 2 years ago

I'm having issues in mapping gene names in Seurat objects from droplet datasets to ENSEMBL IDs.

For example, in droplet_Bladder_seurat_tiss.Robj there are 2887 gene symbols that could not be mapped to ENSEMBL gene IDs using biomaRt::useMart("ensembl", dataset = "mmusculus_gene_ensembl"). Some of those are non-coding RNAs (e.g LOC626049, Serpinb10-ps, Snora34) or pseudogenes (Gm4371) that I am not concerned about, but a large part are synonyms (e.g. Mki67ip, Epb4.1l5, Krtap16-2, 4930455C21Rik) and I don't know how to systematically convert them all to ENSEMBL transcript IDs.

Can you provide some assistance with this? Is there a version of counts objects with ENSEMBL IDs instead of gene names?

Thanks, Anamaria

aopisco commented 2 years ago

the reference genomes used are available here s3://czb-tabula-muris-senis/reference-genome/