chembl / chembl_webservices_2

Source code of the ChEMBL web services.
https://www.ebi.ac.uk/chembl/ws
Other
16 stars 3 forks source link

Solr-based search case sensitiveness #146

Closed apbento closed 6 years ago

apbento commented 6 years ago

Hi,

Please see below an issue from Miguel Pignatelli mp@ebi.ac.uk from Open Targets:

" In Open Targets we are trying to query your api (/api/data/chembl_id_lookup/search endpoint) by drug name, but I have found that the response is different if we use uppercase or lowercase for the query.

For example palbociclib gives results if used lowercase but we don't get any result if uppercase.

palbociclib: if lowercase gives 2 chembl_id_lookupsif https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=palbociclib&format=json

PALBOCICLIB: upper case no lookup found https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=PALBOCICLIB&format=json

But sorafenib is the other way around! No results if lowercase and results if uppercase.

sorafenib: https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=sorafenib&format=json

SORAFENIB: https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=SORAFENIB&format=json

Is that expected? How can I know given a drug name if I should use it uppercase or lowercase? (I can try both, and see which one succeeds, but I prefer to ask before if I'm missing something here).

Thanks for your help! "

mnowotka commented 6 years ago

Yes, I can confirm there is an issue here, for example:

https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=gleevec

gives no results, as well as:

https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=GLEEVEC, while this URL:

https://www.ebi.ac.uk/chembl/api/data/chembl_id_lookup/search?q=Gleevec

returns 15 results. This is not limited to the chembl_id_lookup endpoint similar things happen for molecule but this time this is a query that gives results:

https://www.ebi.ac.uk/chembl/api/data/molecule/search.json?q=gleevec

and those two not: https://www.ebi.ac.uk/chembl/api/data/molecule/search.json?q=Gleevec https://www.ebi.ac.uk/chembl/api/data/molecule/search.json?q=GLEEVEC

Another problem is that it seems to change, for example after few days versions of the query may return the same results within a given endpoint.

There seems to be two issues:

  1. A code that is reposible for indexing should cast everything to lowercase and then the app should lowercase any query.
  2. It seems that when generating a cache, arguments are lowercased already but there is some delay when the results propagate to all MongoDB replicas, this has to be verified.
mnowotka commented 6 years ago

Fixed.