NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Two conflated meanings of STK1 #487

Open IritR opened 11 months ago

IritR commented 11 months ago

While evaluating results for ASM (ci / 8/17)

Result: Isoniazid affects FLT3 contributes to ASM The evidence supporting the 'affects' predicate is a paper about Ser/Thr protein kinase (STK1) Stem Cell Tyrosine Kinase 1 (STK1) is a synonym of FLT3.

sandrine-m commented 11 months ago

When searching for Isonazid I found the paper the tester is refering to comming from Unsecret:

image

digging evidence on edge on ARAX UI, Unsecret reports from TMKP with the following Evidence: https://tmui.text-mining-kp.org/evidence/020b7466ed009bb3134ba1f35912726219f7a4f8b225ff41f57fd1516440d453 tmkp:020b7466ed009bb3134ba1f35912726219f7a4f8b225ff41f57fd1516440d453

There is a conflation of protein symbol Stk1 (from gene symbol pknB) in Staphylococcus aureus with Human gene symbol FLT3 with synonym STK1 Note that UniProt reports very low evidence on this protein.

When I hit the Name Resolver (10 first responses for "STK1") here is what I get:

{
  "UniProtKB:O55099": [
    "Stk1",
    "Stk12",
    "Aik2",
    "Stk5",
    "Ark2",
    "Aim1",
    "Aurkb",
    "Airk2",
    "rAURKB",
    "ARK-2 (rat)",
    "STK-1 (rat)",
    "aurora 1 (rat)",
    "aurora kinase B (rat)",
    "aurora-related kinase 2 (rat)",
    "aurora/IPL1-related kinase 2 (rat)",
    "serine/threonine-protein kinase 5 (rat)",
    "serine/threonine-protein kinase 12 (rat)",
    "serine/threonine-protein kinase aurora-B (rat)",
    "aurora- and IPL1-like midbody-associated protein 1 (rat)"
  ],
  "UMLS:C0255652": [
    "STK1",
    "CAK",
    "CAK1",
    "CDK7",
    "p39 Mo15",
    "EC 2.7.11.23",
    "EC 2.7.11.22",
    "CDK7 protein, human",
    "CDK-Activating Kinase",
    "39 kDa Protein Kinase",
    "CDK-Activating Kinase 1",
    "Cyclin-Dependent Kinase 7",
    "Cell Division Protein Kinase 7",
    "Serine/Threonine-Protein Kinase 1",
    "TFIIH Basal Transcription Factor Complex Kinase Subunit"
  ],
  "PR:000004517": [
    "Stk1",
    "STK12",
    "AIK2",
    "Stk5",
    "AIM1",
    "ARK2",
    "AIM-1",
    "ARK-2",
    "AURKB",
    "STK-1",
    "AIRK2",
    "aurora 1",
    "aurora kinase B",
    "aurora-related kinase 2",
    "aurora/IPL1-related kinase 2",
    "serine/threonine-protein kinase 5",
    "serine/threonine-protein kinase 12",
    "serine/threonine-protein kinase aurora-B",
    "aurora- and Ipl1-like midbody-associated protein 1"
  ],
  "PR:000002032": [
    "STK1",
    "HYK",
    "TEK",
    "TIE2",
    "VMCM",
    "VMCM1",
    "Tie-2",
    "CD202b",
    "p140 TEK",
    "angiopoietin-1 receptor",
    "endothelial tyrosine kinase",
    "tyrosine-protein kinase receptor TEK",
    "tunica interna endothelial cell kinase",
    "tyrosine-protein kinase receptor TIE-2",
    "tyrosine kinase with Ig and EGF homology domains-2"
  ],
  "PR:000002001": [
    "STK1",
    "FLT3",
    "FLK2",
    "FLK-2",
    "CD135",
    "STK-1",
    "FLT-3",
    "FL cytokine receptor",
    "fetal liver kinase 2",
    "Fms-like tyrosine kinase 3",
    "stem cell tyrosine kinase 1",
    "tyrosine-protein kinase FLT3",
    "tyrosine-protein kinase receptor FLT3",
    "tyrosine-protein kinase receptor flk-2",
    "receptor-type tyrosine-protein kinase FLT3"
  ],
  "PR:000005265": [
    "STK1",
    "CAK",
    "CDK7",
    "CRK4",
    "MO15",
    "CAK1",
    "Mpk-7",
    "Cdkn7",
    "p39 Mo15",
    "CR4 protein kinase",
    "39 kDa protein kinase",
    "CDK-activating kinase",
    "cyclin-dependent kinase 7",
    "protein-tyrosine kinase MPK-7",
    "cell division protein kinase 7",
    "TFIIH basal transcription factor complex kinase subunit"
  ],
  "NCBIGene:2322": [
    "STK1",
    "STK1",
    "FLK2",
    "FLT3",
    "FLK2",
    "CD135",
    "CD135",
    "FLT3 Gene",
    "FLT3 gene",
    "STEM CELL TYROSINE KINASE 1",
    "fms related tyrosine kinase 3",
    "fms-related tyrosine kinase 3",
    "FMS-RELATED TYROSINE KINASE 3",
    "FMS-Related Tyrosine Kinase 3 Gene",
    "fms related receptor tyrosine kinase 3"
  ],
  "NCBIGene:1022": [
    "STK1",
    "STK1",
    "CAK",
    "CAK",
    "CDK7",
    "MO15",
    "CAK1",
    "MO15",
    "CAK1",
    "CDKN7",
    "CDKN7",
    "CDK7 gene",
    "CDK7 Gene",
    "KINASE SUBUNIT OF CAK",
    "MO15, XENOPUS, HOMOLOG OF",
    "CYCLIN-DEPENDENT KINASE 7",
    "cyclin dependent kinase 7",
    "Cyclin-Dependent Kinase 7 Gene",
    "CELL DIVISION PROTEIN KINASE 7",
    "SERINE/THREONINE PROTEIN KINASE 1",
    "cyclin-dependent kinase 7 (homolog of Xenopus MO15 cdk-activating kinase)",
    "cyclin-dependent kinase 7 (MO15 homolog, Xenopus laevis, cdk-activating kinase)"
  ],
  "UMLS:C1705127": [
    "STK1",
    "FLK2",
    "FLT3",
    "FLK-2",
    "CD135",
    "RP11-153M24.3",
    "FLT3 wt Allele",
    "FMS-Related Tyrosine Kinase 3 wt Allele"
  ],
  "UMLS:C2983129": [
    "STK1",
    "Serine/Threonine Kinase Stk1 Gene",
    "MO15",
    "CAK1",
    "CDKN7",
    "p39MO15",
    "CDK7 wt Allele",
    "Kinase Subunit of CAK Gene",
    "Cyclin-Dependent Kinase 7 wt Allele",
    "Serine/Threonine Protein Kinase 1 Gene",
    "Serine/Threonine Protein Kinase MO15 Gene",
    "Homolog of Xenopus MO15 Cdk-Activating Kinase Gene",
    "Cyclin-Dependent Kinase 7 (Homolog of Xenopus Mo15 Cdk-Activating Kinase) Gene",
    "Cyclin-Dependent Kinase 7 (Mo15 Homolog, Xenopus laevis, Cdk-Activating Kinase) Gene"
  ]
}

Which looks to me as a mixture of genes from several species. As an example, using EnsEMBL biomart I searched for the two first returned responses from Name Res: UniProtKB:O55099 in rat maps to AURKB gene symbol in Human (OrthoDB). UMLS:C0255652 in Human maps to CDK7 gene symbol in Human.

Name resolution does not seem to take into account the species when resolving the names but does not return the species associated to each element in the result which make it very difficult to choose what is the correct one.

I went on and queried at Node Norm to see whether the correct Uniprot ID for that protein would be normalized properly:

  "curies": [
    "UniprotKB:A0A454GYE1"
  ],
  "conflate": true
}

and the result is only this:

{
  "UniprotKB:A0A454GYE1": {
    "id": {
      "identifier": "UniProtKB:A0A454GYE1",
      "label": "A0A454GYE1_STAAU Serine/threonine protein kinase Stk1 (Fragment) (trembl)"
    },
    "equivalent_identifiers": [
      {
        "identifier": "UniProtKB:A0A454GYE1",
        "label": "A0A454GYE1_STAAU Serine/threonine protein kinase Stk1 (Fragment) (trembl)"
      }
    ],
    "type": [
      "biolink:Protein",
      "biolink:Polypeptide",
      "biolink:BiologicalEntity",
      "biolink:NamedThing",
      "biolink:Entity",
      "biolink:GeneProductMixin",
      "biolink:ChemicalEntityOrGeneOrGeneProduct",
      "biolink:ChemicalEntityOrProteinOrPolypeptide",
      "biolink:ThingWithTaxon",
      "biolink:GeneOrGeneProduct",
      "biolink:MacromolecularMachineMixin"
    ]
  }
}

Note that I could not find the gene id=A0A454GYE1 on mygene.info:

{
  "q": [
    "A0A454GYE1"
  ],
  "scopes": [
    "uniprot"
  ]
}

I am not sure if TMKP get their name resolution from current Name Res but that could be the source issue perhaps?