biothings / mychem.info

MyChem.info: A BioThings API for chemical/drug annotations
http://mychem.info
Apache License 2.0
16 stars 14 forks source link

DataTransform not joining some documents that belong together #127

Open ravila4 opened 2 years ago

ravila4 commented 2 years ago

I have found several documents from aeolus, unii, and ginas that belong together with documents from chembl/pubchem via primary key. For example, http://mychem.info/v1/chem/22T8Z09XAK and http://mychem.info/v1/chem/XNCKCDBPEMSUFA-UHFFFAOYSA-N both refer to the same entity and should be joined.

I think that the datatransform graph is missing some important links. This is the current graph of connections provided by MyChem's keylookup module. Note that links are missing for the drugcentral and rxnorm nodes.

mychem_graph .

In the example above, the two documents could be linked by a via aeolus.unii, aeolus.rxnorm or unii.unii to drugcentral.unii or drugcentral.rxnorm.

Additionally, parsers, such as Drugcentral's which perform id resolution in the parser could benefit from offloading this steps to the datatransform module. For example, this is the current code that Drugentral uses to determine the primary id for documents without inchikey: https://github.com/biothings/mychem.info/blob/e7c32479e1263a036c2f8c45fbe92c878b32c500/src/hub/dataload/sources/drugcentral/drugcentral_parser.py#L161-L185

In the code above, the parser is running requests against the live MyChem database. It would be better to deal with resolution without depending on external requests.

ravila4 commented 2 years ago

Additional examples: http://mychem.info/v1/query?q=apadamtase%20alfa - all these documents belong together, and could be joined by mapping drugname (fda_orphan_drug.generic_name) to unii.