Fix of Issue#67 - Githubissues

This PR fixes ChEMBL Parser missing information #67.

Previously, we had only one data source (molecule) for ChEMBL API. To match the display on ChEMBL web interface (e.g. CHEMBL744) , we need to enrich the molecule json objects by adding drug indications, drug mechanism fields. (Target predictions, although mentioned in the original issue, are not accessible via published API so far; we can wait for ChEMBL to refresh their APIs before handling this field.)

This fix introduced 5 extra data sources:

drug_indication: 37,259 json objects
mechanism: 5,134 json objects
drug: 13,308 json objects
target: 13,382 json objects
binding_site: 14,342 json objects

Since the newly-introduced data source are much smaller than the root data source molecule (1,961,462 json objects), the enrichment is carried out in RAM. After augmented with the other 3 extra data sources, all drug_indication and mechanism data will be pre-read into 2 dictionaries by ChemblUploader. Then for each molecule entry, corresponding drug_indication and mechanism entries (bound by the same molecule_chembl_id) will be attached as new sub fields. The enriched molecule entries will then be indexed as usual and serve the API. Samples of the enriched json objects can be found in Gist: erikyao/Sample Json Responses.md

biothings / mychem.info

Fix of Issue#67 #93