Closed inodb closed 3 years ago
Thanks, @inodb.
Just copied all the columns below and let's discuss which ones to keep or add.
hgnc_symbol | KMT2B | KMT2D | |
---|---|---|---|
ensembl_canonical_gene | ENSG00000272333 | ENSG00000167548 | |
ensembl_canonical_transcript | ENST00000222270 | ENST00000301067 | |
genome_nexus_canonical_transcript | ENST00000222270 | ENST00000301067 | |
uniprot_canonical_transcript | ENST00000420124 | ENST00000301067 | |
mskcc_canonical_transcript | ENST00000222270 | ENST00000301067 | |
hgnc_id | HGNC:15840 | HGNC:7133 | |
approved_name | lysine methyltransferase 2B | lysine methyltransferase 2D | |
locus_group | protein-coding gene | protein-coding gene | |
locus_type | gene with protein product | gene with protein product | |
status | Approved | Approved | |
chromosome | 19q13.12 | 12q13.12 | |
location_sortable | 19q13.12 | 12q13.12 | |
synonyms | KIAA0304, MLL2, TRX2, HRX2, WBP7, MLL1B, MLL4, CXXC10 | ALR, MLL4, CAGL114 | |
alias_name | myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila) 4, Histone-lysine N-methyltransferase 2B | histone-lysine N-methyltransferase 2D | |
previous_symbols | TNRC21, MLL2 | ||
prev_name | lysine (K)-specific methyltransferase 2B | trinucleotide repeat containing 21, myeloid/lymphoid or mixed-lineage leukemia 2, lysine (K)-specific methyltransferase 2D | |
gene_family | PHD finger proteins, Zinc fingers CXXC-type, Lysine methyltransferases, SET domain containing | PHD finger proteins, Lysine methyltransferases, Trinucleotide repeat containing, SET domain containing | |
gene_family_id | 88, 136, 487, 1399 | 88, 487, 775, 1399 | |
date_approved_reserved | 5/9/13 | 10/14/98 | |
date_symbol_changed | 5/9/13 | ||
date_name_changed | 2/12/16 | 2/12/16 | |
date_modified | 3/6/18 | 3/6/18 | |
entrez_gene_id | 9757 | 8085 | |
vega_id | OTTHUMG00000048119 | OTTHUMG00000166524 | |
ucsc_id | |||
accession_numbers | AJ007041 | AF010403 | |
refseq_ids | NM_014727 | NM_003482 | |
ccds_id | CCDS46055 | CCDS44873 | |
uniprot_id | Q9UMN6 | O14686 | |
pubmed_id | 10409430, 10637508 | 9247308 | |
mgd_id | MGI:109565 | MGI:2682319 | |
rgd_id | RGD:7678027 | RGD:2324324 | |
lsdb | |||
cosmic | KMT2D | ||
omim_id | 606834 | 602113 | |
mirbase | |||
homeodb | |||
snornabase | |||
bioparadigms_slc | |||
orphanet | 239011 | ||
pseudogene.org | |||
horde_id | |||
merops | |||
imgt | |||
iuphar | objectId:2689 | objectId:2691 | |
kznf_gene_catalog | |||
mamit-trnadb | |||
cd | |||
lncrnadb | |||
enzyme_id | |||
intermediate_filament_db | |||
rna_central_ids |
I looked at our cbioportal database, I think the data above covers everything we need. (We don't have length, but I think we can remove the LENGTH COLUMN in the GENE table now - it was previously used in the Mutated Genes tab).
@n1zea144 could someone in your group also take a look, e.g. check all genes in the current portal database are covered.
@zhx828 could you check if all genes in oncokb are covered?
@inodb @jjgao I think due to recent gene updates in portal, some genes in OncoKB are no longer match with GN and portal. I will need to update the genes in the next release. https://docs.google.com/spreadsheets/d/1mqmH1ccKWli7te7L8v0lIQh6uWbnSf2gu07l76RL-gI/edit?usp=sharing
I will ask one of the curators (probably @rmadupuri) to take a look at gene coverage.
@inodb These two genes in GN use different uniport isoforms comparing to the vcf2mac uniport file
gene | isoform |
---|---|
HIST1H2BO | ENST00000616182 |
ARID3B | ENST00000622429 |
@inodb @zhx828 @n1zea144 @rmadupuri please prioritize this one. This will fix a couple of existing issues, e.g. #5910 and cBioPortal/datahub#540 and will give us a clean start to re-import all studies.
Hi @jjgao, the following genes in database did not have matches in GN. Divided them to 4 sheets. https://docs.google.com/spreadsheets/d/1JCx12E86TGbMydRuzwFSUVgJ2HwlGAvKFViDfgaEK2U/edit?usp=sharing
The above miRNA's might have been covered in GN but its not easy to compare since the portal has negative entrez ids and GN has positive ids.
I looked at a few protein-coding genes:
We should do a more systematic analysis. Before doing that, I am wondering if you can help to add a couple of more columns in the spreadsheet so that we know how much data are there for each gene in the public portal? @rmadupuri
mutation_event
tablegenetic_alteration
tablerelated issue https://github.com/cBioPortal/cbioportal/issues/6432
Once we switch, it is also an opportunity to switch to previous symbols
instead of synonyms
and hopefully remove manny ambiguity, e.g. mll2. We maybe able too remove this file too: https://github.com/cBioPortal/cbioportal/blob/master/core/src/main/resources/gene_symbol_disambiguation.txt.
Hi @jjgao - @inodb and I met yesterday to discuss this effort and we have some thoughts about this that I will outline in a google doc / rfc. I'll link back once I have a draft.
@jjgao Per our discussion, we will revisit the utility of Entrez, or Ensembl id's in a later effort. With that, we can build out the roadmap of work to be done on this issue. cc: @inodb
Here is link to RFC: Gene Data RFC
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
still an issue
@inodb what's the status on this? Should we turn it into an epic?
@inodb @jjgao This is now part of our scrum planning (been for 2 weeks). We have been working on finding the best source for our gene table and whatever utilization leads to what kind of data loss.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@inodb @yichaoS can this be closed?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Closing this. Please create new issues if needed.
Currently we have a gene to transcript mapping in cBioPortal and there is one in Genome Nexus (https://github.com/genome-nexus/genome-nexus-importer/blob/master/data/grch37_ensembl92/export/ensembl_biomart_canonical_transcripts_per_hgnc.txt). We should try to have a single one.
Further discussion:
Check:
CC @sheridancbio @n1zea144 @zhx828 @jjgao