Closed hickst closed 8 years ago
I'm not sure this is fixed:
Similarly , EHT1864 , a direct inhibitor of RAC but not CDC42 activation , dose-dependently inhibited AKT phosphorylation induced by LPA and S1P ( E ) , but not EGF , PDGF , or insulin ( F ) .
mention text: AKT
List(Gene_or_gene_product, MacroMolecule, BioChemicalEntity, BioEntity, Entity, PossibleController)
------------------------------
Rule => ner-gene_or_gene_product-entities
Type => CorefTextBoundMention
------------------------------
Protein|List(Gene_or_gene_product, MacroMolecule, BioChemicalEntity, BioEntity, Entity, PossibleController) => AKT
grounding: KBResolution(akt, uniprot, P54644, dictyostelium discoideum)
------------------------------
uniprot entry for P54644: http://www.uniprot.org/uniprot/P54644 desired uniprot entry (P31749): http://www.uniprot.org/uniprot/P31749
Is this the same issue, or is something else causing it?
I believe this issue has been fixed. Uniprot does not list AKT as a human protein. Dicty is alphabetically the first of the candidate groundings from our uniprot protein KB. Candidates for the string AKT are: KBResolution(akt, uniprot, P54644, dictyostelium discoideum) KBResolution(akt, uniprot, Q8INB9, drosophila melanogaster) KBResolution(akt, uniprot, Q8INB9, fruit fly) KBResolution(akt, uniprot, P31750, mouse) KBResolution(akt, uniprot, P31750, mus musculus) KBResolution(akt, uniprot, P54644, slime mold)
Rule => ner-gene_or_gene_product-entities
Type => CorefTextBoundMention
------------------------------
Protein|List(Gene_or_gene_product, MacroMolecule, BioChemicalEntity, BioEntity, Entity, PossibleController) => AKT1
grounding: KBResolution(akt1, uniprot, P31749, homo sapiens)
Oh, I see. My mistake...from reading the description under "Function", it sounds like AKT is used synonymously with AKT1. I'm surprised there isn't such an entry in those we derived from the uniprot entries + listed synonyms.
I think you're encountering what makes these identifications so hard: the papers are using a variety of lexical synonymy: aliases, nicknames, hyponymy, metonymy, and (probably) meronymy. Our KBs only have, at best, some synonym strings sometimes.
I think the correct answer here would be to ground AKT to a human protein family. Unfortunately, entries in the protein family databases don't usually correspond to what authors think about when they use non-specific protein names like "AKT", "RAF", "MEK" or "ERK". For AKT, the "correct" answer in my view would be to ground it to a protein family that resolves to the isoforms AKT1, AKT2 and AKT3. I know of one structured source that does this: http://resource.belframework.org/belframework/1.0/resource/protein-families.bel Here the entry PFH:"AKT Family" resolves to HGNC:AKT1, HGNC:AKT2, HGNC:AKT3.
Thanks Ben!....a very interesting resource. To use it, it seems like we would need the second half: the mapping of the individual proteins from their BEL designations to info about them (and maybe even to their corresponding Uniprot IDs). We will definitely think about how we can incorporate this KB but it may require a bit of new infrastructure to handle it.
Great! This could be relevant for many other cases, for instance, RAS. Currently the system grounds RAS to IPR020849, which is correct but InterPro doesn't really tell you who the members of the family are. This makes downstream assembly/analysis difficult. Again, in this OpenBEL resource, it is clear RAS resolves to HRAS, KRAS and NRAS isoforms: p(PFH:"RAS Family") hasMembers list(p(HGNC:HRAS), p(HGNC:KRAS), p(HGNC:NRAS))
Thanks Ben! This is very useful! We will try to integrate this soon. Mihai
On Fri, Mar 11, 2016 at 11:05 AM, Benjamin M. Gyori < notifications@github.com> wrote:
Great! This could be relevant for many other cases, for instance, RAS. Currently the system grounds RAS to IPR020849, which is correct but InterPro doesn't really tell you who the members of the family are. This makes downstream assembly/analysis difficult. Again, in this OpenBEL resource, it is clear RAS resolves to HRAS, KRAS and NRAS isoforms: p(PFH:"RAS Family") hasMembers list(p(HGNC:HRAS), p(HGNC:KRAS), p(HGNC:NRAS))
— Reply to this email directly or view it on GitHub https://github.com/clulab/reach/issues/110#issuecomment-195480859.
Need to sort returned candidates by human then by empty species, then by all other species.