biothings / mychem.info

MyChem.info: A BioThings API for chemical/drug annotations
http://mychem.info
Apache License 2.0
16 stars 14 forks source link

New Data Source: GSRS #176

Closed newgene closed 2 months ago

newgene commented 4 months ago

URL: https://gsrs.ncats.nih.gov

It provides a downloadable .gsrs file. And this file is essentially a compressed 7-zip file with a list of JSON objects.

NOTE: GSRS resource is likely a successor of the previous GINAS resource (https://ginas.ncats.nih.gov redirects to https://gsrs.ncats.nih.gov now). We can include both gsrs and ginas for now, and can remove ginas when we don't need it any more.

newgene commented 4 months ago

The JSON object does not seem including inchi or inchikey field (smiles field available though), still confirming it with the GSRS team.

newgene commented 3 months ago

We confirmed with the GSRS team that inchikey was calculated based on the smiles field.

The KNIME workflow has an example on how to calculate inchikeys from GSRS using RDKit nodes, if that helps. https://hub.knime.com/-/spaces/-/~8MCL_tgTaY7uA37U/current-state/

In this case, we might just use our existing mapping from smiles to inchikey at MyChem.info to get the inchikey value as the primary _id key.

NeuralFlux commented 3 months ago

@newgene each record signifies either a chemical, concept, polymer, nucleic acid, protein, mixture, substance group, or diverse. Only chemicals and polymers have SMILES and InChI. What should each record in our API signify?

PS this dashboard is useful for data exploration