Open gaurav opened 1 year ago
I haven't had a chance to dig deeply into this yet, but it looks like the NCBIGeneENSEMBL concord is built from babel_downloads/NCBIGene/gene2ensembl.gz, which only maps NCBIGene:7124 to ENSG00000232810:
gene2ensembl.gz:9606 7124 ENSG00000232810 NM_000594.4 ENST00000449264.3 NP_000585.2 ENSP00000398698.2
However, babel_downloads/ENSEMBL/hsapiens_gene_ensembl/BioMart.tsv has many more mappings, including to the identifiers specified above, which appear to all link to HGNC:11892:
havana TNF ENSP00000389265 DIF CHR_HSCHR6_MHC_APD_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228978
havana TNF ENSP00000389265 TNF-alpha CHR_HSCHR6_MHC_APD_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228978
havana TNF ENSP00000389265 TNFA CHR_HSCHR6_MHC_APD_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228978
havana TNF ENSP00000389265 TNFSF2 CHR_HSCHR6_MHC_APD_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228978
ensembl_havana TNF ENSP00000365290 DIF CHR_HSCHR6_MHC_COX_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000204490
ensembl_havana TNF ENSP00000365290 TNF-alpha CHR_HSCHR6_MHC_COX_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000204490
ensembl_havana TNF ENSP00000365290 TNFA CHR_HSCHR6_MHC_COX_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000204490
ensembl_havana TNF ENSP00000365290 TNFSF2 CHR_HSCHR6_MHC_COX_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000204490
ensembl_havana TNF ENSP00000389490 DIF CHR_HSCHR6_MHC_MCF_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000223952
ensembl_havana TNF ENSP00000389490 TNF-alpha CHR_HSCHR6_MHC_MCF_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000223952
ensembl_havana TNF ENSP00000389490 TNFA CHR_HSCHR6_MHC_MCF_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000223952
ensembl_havana TNF ENSP00000389490 TNFSF2 CHR_HSCHR6_MHC_MCF_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000223952
ensembl_havana TNF ENSP00000410668 DIF CHR_HSCHR6_MHC_DBB_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228849
ensembl_havana TNF ENSP00000410668 TNF-alpha CHR_HSCHR6_MHC_DBB_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228849
ensembl_havana TNF ENSP00000410668 TNFA CHR_HSCHR6_MHC_DBB_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228849
ensembl_havana TNF ENSP00000410668 TNFSF2 CHR_HSCHR6_MHC_DBB_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228849
ensembl_havana TNF ENSP00000392858 DIF CHR_HSCHR6_MHC_MANN_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228321
ensembl_havana TNF ENSP00000392858 TNF-alpha CHR_HSCHR6_MHC_MANN_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228321
ensembl_havana TNF ENSP00000392858 TNFA CHR_HSCHR6_MHC_MANN_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228321
ensembl_havana TNF ENSP00000392858 TNFSF2 CHR_HSCHR6_MHC_MANN_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000228321
ensembl_havana TNF ENSP00000389492 DIF CHR_HSCHR6_MHC_SSTO_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000230108
ensembl_havana TNF ENSP00000389492 TNF-alpha CHR_HSCHR6_MHC_SSTO_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000230108
ensembl_havana TNF ENSP00000389492 TNFA CHR_HSCHR6_MHC_SSTO_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000230108
ensembl_havana TNF ENSP00000389492 TNFSF2 CHR_HSCHR6_MHC_SSTO_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000230108
ensembl_havana TNF ENSP00000372988 DIF CHR_HSCHR6_MHC_QBL_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000206439
ensembl_havana TNF ENSP00000372988 TNF-alpha CHR_HSCHR6_MHC_QBL_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000206439
ensembl_havana TNF ENSP00000372988 TNFA CHR_HSCHR6_MHC_QBL_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000206439
ensembl_havana TNF ENSP00000372988 TNFSF2 CHR_HSCHR6_MHC_QBL_CTG1 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000206439
ensembl_havana TNF ENSP00000398698 DIF 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000398698 TNF-alpha 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000398698 TNFA 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000398698 TNFSF2 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000514308 DIF 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000514308 TNF-alpha 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000514308 TNFA 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
ensembl_havana TNF ENSP00000514308 TNFSF2 6 protein_coding HGNC Symbol 7124.0 tumor necrosis factor [Source:HGNC Symbol;Acc:HGNC:11892] ENSG00000232810
So those might be the missing mappings we need to include in the NCBIGeneENSEMBL concord.
It looks like there is some code in Babel for generating an ENSEMBL concord:
https://github.com/TranslatorSRI/Babel/blob/f3748b881082f7f573409e8e75822cd02b6becb5/src/snakefiles/gene.snakefile#L69-L75 https://github.com/TranslatorSRI/Babel/blob/f3748b881082f7f573409e8e75822cd02b6becb5/src/createcompendia/gene.py#L25-L74
However, this code is not currently being run, because gene_concords doesn't include "ENSEMBL":
https://github.com/TranslatorSRI/Babel/blob/f3748b881082f7f573409e8e75822cd02b6becb5/config.json#L19
@cbizon Do you know if this was deactivated deliberately? I'm currently trying to re-run Babel after adding "ENSEMBL" to the list of concords to see if the concord can be generated correctly and if it includes the mappings we're looking for.
I seem to recall that the ensembl mappings led to some very unpleasant merges. We'll want to be careful with them.
As noted in https://github.com/NCATSTranslator/Feedback/issues/340, Babel has multiple distinct Ensembl gene identifiers for the TNF gene:
Here are the RENCI-dev results: https://nodenormalization-sri.renci.org/1.3/get_normalized_nodes?curie=NCBIGene:7124&CHEMBL1825&curie=ENSEMBL%3AENSG00000230108&curie=ENSEMBL%3AENSG00000228849&conflate=true