Closed newgene closed 10 months ago
perhaps related to the annotation service effort? https://github.com/biothings/biothings_explorer/issues/344#issuecomment-1583034829
The description text shown in the Title and Summary
section of a Compound Summary page is actually the description of the mapped entities.
E.g. on the CID:16051951 page, the description Clindamycin hydrochloride is a S-glycosyl compound.
, is actually from its mapped CHEBI:176915 page.
The same pattern applies to mapped NCIt and MeSH.
Therefore if we can find all the CID-ChEBI, CID-NCIt, and CID-MeSH mappings, we can fetch all the description texts.
Currently our plugin uses the XML files from pubchem/Compound/CURRENT-Full/XML folder; however, those XML files do not contain the mappings.
In the pubchem/Compound/Extras folder, there SHOULD be a SID-Map.gz
file. According to the README:
This is a listing of all (live) SIDs with their source names and registry identifiers, and the standardized CID if present. It is a gzipped text file where each line contains at least three columns: SID, tab, source name, tab, registry identifier; then a fourth column of tab, CID if there is a standardized CID for the given SID.
However, this file is missing in the folder. Maybe we can ask NIH to provide it.
Also note that the CID-MeSH file does not provide MeSH IDs.
Just FYI:
This section can be pulled with the following command:
zcat Compound_016000001_016500000.xml.gz | sed -n '49819264,49820357p;49820358q'
We might end up don't have to do extra ID mappings, since we already have the mappings to CHEBI, NCIT and UMLS.
For the particular case above:
https://mychem.info/v1/chem/AUODDLQVRAJAJM-XJQDNNTCSA-N?fields=unii.ncit,chebi.id,umls.mesh
We don't have mesh ID for this drug/chemical, but some objects do have like "Hydromorphone":
https://mychem.info/v1/chem/WVLOADHCBXTIJK-YNHQPCIGSA-N?fields=unii.ncit,chebi.id,umls.mesh
Then we can get their descriptions from our CHEBI and NCIT APIs:
https://biothings.ncats.io/chebi/chemical/CHEBI:176915?fields=def (checked that don't have def field in the latest CHEBI obo file) https://biothings.ncats.io/chebi/chemical/CHEBI:5790?fields=def (this one does have the def field)
https://biothings.ncats.io/ncit/node/NCIT:C47977?fields=def https://biothings.ncats.io/ncit/node/NCIT:C62034?fields=def
Not sure if we can get mesh description easily based on the MESH id, but we can go with CHEBI and NCIT first.
This mychem query can be useful to list all hits contains all three IDs:
As the next step, we probably don't need to do anything yet at MyChem.info side. We can implement the logic at the Translator Annotator Service side, using the existing mapped CHEBI and NCIT IDs to retrieve their descriptions. We will then evaluate how good they are, if we need to improve the mapping at MyChem.info (e.g. using PubChem's extra mapping file) and also how we can get the MESH description.
Later, it will still be good to include these descriptions directly in MyChem.info.
Drug/Chemical description:
from CHEBI:
MyChem.info also has the chebi.definition
field available, we don't need to do anything
from NCIT:
MyChem.info currently has the mapping from chemical/drug to NCIT term ID via unii.ncit
field. The description/definition value can be then retrieved from our NCIT biothings API at https://biothings.ncats.io/ncit.
Here are a few examples from the Translator Annotator service, which uses multiple BioThings APIs, including MyChem.info, to annotate chemicals/drugs:
Closing this issue now, since we don't need to include additional chemical/drug description fields for now.
PubChem looks like has informative description from their website:
https://pubchem.ncbi.nlm.nih.gov/compound/16051951
Let's see if we can include these descriptions (or one of them) from PubChem or other sources in MyChem.info.
https://mychem.info/v1/query?q=16051951