Closed gear-portal-team closed 2 months ago
@toby-clark4 - if you could clarify. Are you interested in how we map gene symbols to ENSEMBL IDs in general for datasets which initially don't have them, or you want to download this individual dataset which has its genes and ensembl IDs mapped already?
@jorvis - thanks for the reply. I'm more interested in the first point - I currently have the dataset in RDS format with gene symbols but no ensembl IDs, but can't figure out the mapping used to connect the gene symbols and ensembl IDs, which I need to tokenize the data. Searching the symbols with the gEAR dataset gives a link to the ensembl page for each gene, so I was wondering what mapping system you use for this?
Got it. So the general strategy we use has the following steps:
These steps are performed with the following scripts:
And a prerequisite of #1 is that you've created a database using our schema file before loading:
https://github.com/IGS/gEAR/blob/main/create_schema.sql
(although only subset of all that is used for this purpose)
It's a lot, I know, but it's what supports the gEAR overall. It wasn't written as a stand-alone mapping utility!
Alternatively, tools like BioMart should allow you to do this.
Closing. Please re-open if there are more questions.
From: Toby Clark
Email: trc43@cam.ac.uk
Server IP: 10.142.0.16
Msg: Hello, Sorry as this is not directly related to your program, but I've seen that gEAR can provide the ENSGID for any of the gene names from the study, and I haven't been able to find any other way to do this. Would you be able to tell me what gene symbol-ensgID mapping you use/how I could use it myself?
Thanks and best wishes, Toby
Tags: ['RNAseq']
Screenshot: None