NCATS-Gamma / robokop

Master UI for ROBOKOP
MIT License
16 stars 3 forks source link

import guide to pharmacology, was: Missing prolactin edges #393

Open cbizon opened 5 years ago

cbizon commented 5 years ago

We have prolactin (chebi:81580) but it has no connections to any genes that would be its receptor (or to anything else)

cbizon commented 5 years ago

This is probably a generic problem for non-drug endogenous ligands. Most of the chemical-gene information that we have is from drug sources (drugbank, etc). The natural ligands may or may not be characterized, especially if the natural ligand is not used as a drug.

I found this source https://www.guidetopharmacology.org which looks very promising. Contains a PRL-PRLR link. But to do this right we will also need to sort out genes vs chemicals, as the link here is at the gene-gene level. And there's a whole new set of id's to map etc etc...

cbizon commented 5 years ago

We now have IUPHAR loaded. However, there are issues around synonymization that are making this less useful than we would like. Specifically, it's hard to synonymize across these gene products. Related also to #359 .

Basically, we want to treat the peptides as chemicals. But none of the databases have smiles for them. We link by SMILES. So we probably need to start linking by amino acid sequence or by compiling protein annotations or something. Here are some examples:

Prolactin

So the problem is that there is a CHEBI/CAS/KEGG set of identifiers and an IUPHAR/UniProt/PR set of identifiers, basically corresponding to thinking of prolactin as a chemical vs thinking of it as a protein/gene, but no link between the two. The only link I see is at the AA sequence level.

cbizon commented 5 years ago

Vasopressin

There is a structure, so everything works out synonym wise, and you end up with a nice set of relationships:

match (a:chemical_substance {id:"CHEBI:34543"})--(g:gene) return *

image

(note: vasopressin == argipressin)

cbizon commented 5 years ago

Insulin

IUPHAR doesn't have a structure, or an amino acid sequence, which makes this a little challenging. IUPHAR does have a (manual?) annotation to DrugBank for insulin, and that DrugBank ID is one of the equivalent ID for our CHEBI "insulin (human)" node.

cbizon commented 5 years ago

IUPHAR has a download for its peptides. The file lists 588 human peptides. 46 have smiles. IUPHAR is part of UniChem, so these ought to be handled with the current framework. 501 have (single letter) Amino Acid sequences. I would like to link these up to (probably) KEGG in the synonymizer. There are then 41 that don't have any identifier except for a name. The most principled thing would be to ignore them, but they're stuff like "insulin". So I annotated them by hand as well as I could. Specifically, I looked for the product in CHEBI, KEGG, and DrugBank. I'm looking for a chemical id for the product, not an id for the gene (which is much easier but different). Only about 1/3 to 1/2 of them have these IDs, but they're useful, so I'm going to incorporate them.