ebi-chebi / ChEBI

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on ‘small’ chemical compounds.
https://www.ebi.ac.uk/chebi
Creative Commons Attribution 4.0 International
44 stars 10 forks source link

Lucene searches working incorrectly #156

Closed muthuvenkat closed 8 years ago

muthuvenkat commented 15 years ago

But I think I found another tricky compound search:

* searching for 'L-glutamine' is ok

In my case, I wanted both L-glutamic acid and L-glutamine. I tried from Rhea website and using Luke too.

I use this query with ChEBI's lucene index to search for a compound name:

+\(\(ChEBI\ name:\(O2\) ASCII\ name:\(O2\)\)10.0
\(Synonym:\(O2\) IUPAC\ name:\(O2\)\)5.0
\(INN:\(O2\) Cross\ reference:\(O2\)\)\)
+Status:\(C E\)

\(ok, IUPAC\ name to be removed...\)
But the highest score for a checked compound is for CHEBI:29372 - dioxygen\(.1+\). The one used in Rhea - dioxygen, CHEBI:15379 - follows next.

Querying using just 'All fields' \(All\ fields:O2\) gives me dioxygen -  - in 7th place, being diatomic oxygen \(CHEBI:33263\) the first one. But I need a more specialized query. 

Reported by: pauladematos
muthuvenkat commented 15 years ago

This seems to work for the dioxygen. ((ChEBI_name_formatted:o2)10.0 or Names:O2 or IUPAC_Name:O2 Formula:O2) AND Status:(C E)

Original comment by: nobody

muthuvenkat commented 15 years ago

I have tried my favourite compound - dioxygen, CHEBI:15379- in the lucene index test, but again got it in seventh place when searching for 'O2', even if 'O2' is exactly one of its synonyms. The first compound in the results list is CHEBI:29372, which has synonyms similar - but not identical - to 'O2'. In the results page I see 'You searched for O2 in Formula OR O2 in All names', even if I use the advanced search and don't fill the formula field.

Original comment by: nobody

muthuvenkat commented 14 years ago

Original comment by: adekker2