cBioPortal / gdc-et-pipeline

GSoC : Spring-batch based system to extract and transform GDC hosted data to suitable cBioPortal file formats.
GNU Affero General Public License v3.0
0 stars 4 forks source link

CNA processor ENSG -> Hugo #27

Open zheins opened 5 years ago

zheins commented 5 years ago

GDC CNA files have ensembl transcript ids that are converted to Hugo symbols via Genome Nexus. However, some of the transcript ids are for GRCh38, which is not supported by GN. For these, we need to do one of the following:

  1. do the mapping with the Homo_sapiens.gene_info file
  2. use the public VEP API to resolve the entrez id / hugo symbol from the ENSG accession
  3. liftover from grch38 --> grch37 and then use GN to resolve entrez / hugo
zheins commented 5 years ago

From Slack: Angelica [3:33 PM] Zack Heins so I found the gene accession ENSG00000280113 in the Ensembl REST service for assembly GRCh38.

Not much information for this gene as it seems uncategorized.

Here is the server + endpoint where you can fetch info on genes by ensembl accessions:

https://rest.ensembl.org/xrefs/id/<ENSGACCESSION>?content-type=application/json

Response with the gene id above: https://rest.ensembl.org/xrefs/id/ENSG00000280113?content-type=application/json

Response with another gene that is a better representation of what info can be returned:

https://rest.ensembl.org/xrefs/id/ENSG00000157764?content-type=application/json

Doc page for this endpoint https://rest.ensembl.org/documentation/info/xref_id

anyway, like we discussed earlier this mapping is lower priority but good to know that we can get the info we need when / if we get to that point