Previously, we had only one data source (molecule) for ChEMBL API. To match the display on ChEMBL web interface (e.g. CHEMBL744) , we need to enrich the molecule json objects by adding drug indications, drug mechanism fields. (Target predictions, although mentioned in the original issue, are not accessible via published API so far; we can wait for ChEMBL to refresh their APIs before handling this field.)
This fix introduced 5 extra data sources:
drug_indication: 37,259 json objects
mechanism: 5,134 json objects
drug: 13,308 json objects
target: 13,382 json objects
binding_site: 14,342 json objects
Since the newly-introduced data source are much smaller than the root data source molecule (1,961,462 json objects), the enrichment is carried out in RAM. After augmented with the other 3 extra data sources, all drug_indication and mechanism data will be pre-read into 2 dictionaries by ChemblUploader. Then for each molecule entry, corresponding drug_indication and mechanism entries (bound by the same molecule_chembl_id) will be attached as new sub fields. The enriched molecule entries will then be indexed as usual and serve the API. Samples of the enriched json objects can be found in Gist: erikyao/Sample Json Responses.md
This PR fixes ChEMBL Parser missing information #67.
Previously, we had only one data source (
molecule
) for ChEMBL API. To match the display on ChEMBL web interface (e.g. CHEMBL744) , we need to enrich themolecule
json objects by adding drug indications, drug mechanism fields. (Target predictions, although mentioned in the original issue, are not accessible via published API so far; we can wait for ChEMBL to refresh their APIs before handling this field.)This fix introduced 5 extra data sources:
drug_indication
: 37,259 json objectsmechanism
: 5,134 json objectsdrug
: 13,308 json objectstarget
: 13,382 json objectsbinding_site
: 14,342 json objectsSince the newly-introduced data source are much smaller than the root data source
molecule
(1,961,462 json objects), the enrichment is carried out in RAM. After augmented with the other 3 extra data sources, alldrug_indication
andmechanism
data will be pre-read into 2 dictionaries byChemblUploader
. Then for eachmolecule
entry, correspondingdrug_indication
andmechanism
entries (bound by the samemolecule_chembl_id
) will be attached as new sub fields. The enrichedmolecule
entries will then be indexed as usual and serve the API. Samples of the enriched json objects can be found in Gist: erikyao/Sample Json Responses.md