biothings / mychem.info

MyChem.info: A BioThings API for chemical/drug annotations
http://mychem.info
Apache License 2.0
16 stars 14 forks source link

Fix of Issue#67 #93

Closed erikyao closed 3 years ago

erikyao commented 3 years ago

This PR fixes ChEMBL Parser missing information #67.

Previously, we had only one data source (molecule) for ChEMBL API. To match the display on ChEMBL web interface (e.g. CHEMBL744) , we need to enrich the molecule json objects by adding drug indications, drug mechanism fields. (Target predictions, although mentioned in the original issue, are not accessible via published API so far; we can wait for ChEMBL to refresh their APIs before handling this field.)

This fix introduced 5 extra data sources:

Since the newly-introduced data source are much smaller than the root data source molecule (1,961,462 json objects), the enrichment is carried out in RAM. After augmented with the other 3 extra data sources, all drug_indication and mechanism data will be pre-read into 2 dictionaries by ChemblUploader. Then for each molecule entry, corresponding drug_indication and mechanism entries (bound by the same molecule_chembl_id) will be attached as new sub fields. The enriched molecule entries will then be indexed as usual and serve the API. Samples of the enriched json objects can be found in Gist: erikyao/Sample Json Responses.md