biothings / mychem.info

MyChem.info: A BioThings API for chemical/drug annotations
http://mychem.info
Apache License 2.0
16 stars 14 forks source link

Trouble linking aeolus compounds #25

Open cbizon opened 6 years ago

cbizon commented 6 years ago

If I do a simple query q=siltuximab, I get 5 results, with these identifiers and keys:

57894-421 ['_id', '_score', 'ndc']
57894-420 ['_id', '_score', 'ndc']
CHEMBL1743070 ['_id', '_score', 'chembl', 'drugcentral']
DB09036 ['_id', '_score', 'drugbank']
T4H8FMA7IM ['_id', '_score', 'aeolus', 'unii']

The way I actually want to query this data is by asking for compounds that have a particular aeolus outcome. So if I come in and query for a particular outcome, and it matches siluximab, I will get back only aeolus and unii information. I won't get chembl or drugcentral, making it hard to give this compound an identifier that I can integrate other data with.

I don't know if this is a general feature or if I just found one, but it seemed in testing that I often didn't get either a chembl or chebi node when querying by aeolus.

kevinxin90 commented 6 years ago

@newgene @andrewsu This is indeed a case regarding how we merge MyChem.info docs. By default, we merge docs based on the InchiKey. However, in the case of 'siltuximab', it's a peptide without available InchiKey. The 5 results return when making queries like http://mychem.info/v1/query?q=siltuximab all refers to the same drug. But it's shown as 5 separate docs in MyChem.info. Potential solution is to group them based on drugname when InchiKey is not available.

newgene commented 6 years ago

That's true. We are working on an id mapping utility function to merge these docs into one. Essentially when InchiKey is not available, we will use a priority list to define the primary key ("_id" field), e.g. drugbank id would be preferred, then chebi, then chembl, etc. As long as we keep this priority order consistent for all data sources in mychem.info, different sources can still be merged even when InchiKey is not available.

We are undergoing a major refactoring of mychem.info, these issues are on our list to be fixed.

greg-k-taylor commented 5 years ago

@cbizon It took me a while to understand what you are asking for. Does this query solve your problem?

http://mychem.info/v1/query?q=aeolus.outcomes.name:Hostility