Tatoeba / tatoeba2

Tatoeba is a platform whose purpose is to create a collaborative and open dataset of sentences and their translations.
https://tatoeba.org
GNU Affero General Public License v3.0
712 stars 132 forks source link

With the "relevance" setting in the advanced search, the "or" operator doesn't get the expected resuls. #1895

Open ckjpn opened 5 years ago

ckjpn commented 5 years ago

Search string: in a|the glass

https://tatoeba.org/eng/sentences/search?query=in+a%7Cthe+glass&from=eng&to=none&orphans=&unapproved=no&native=yes&user=&tags=&list=&has_audio=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=relevance

The expected results don't appear until about the 7th sentence.

BTW, "in a|the glass" (with the quotes) doesn't get any results. I tried it just in case that was how I should do such a search.

Maybe some of the other things listed in the Wiki page about searching also won't work as people will expect. I'm just reporting this so you might be aware that some problems might exist.

The "wildcard" works, though "in glass" https://tatoeba.org/eng/sentences/search?query=%22in++glass%22&from=eng&to=none&orphans=no&unapproved=no&user=&tags=&list=&has_audio=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=relevance

AndiPersti commented 5 years ago

I don't think this really is a bug. Your search string means "look for sentences which contain 'in' AND ('a' OR 'the') AND 'glass' in any order and any number" and the first 7 sentences all contain these 4 keywords and thus get a higher weight.

It looks like you wanted an exact search for "in the glass" or "in a glass" but unfortunately the search engine ignores boolean operators (i.e. |) in an exact phrase search so the search string "in a|the glass" actually means "in a the glass" and since there's no sentence with that exact phrase in the database you don't get any results.

So what are the alternatives? You've tried already "in * glass" but that gives you also sentences which contain for example "in his glass". Another option would be to use a strict order search: in << a|the << glass but that doesn't account for the proximity of the keywords. Conversely, in NEAR/1 a|the NEAR/1 glass forces the keywords to be close to another but not in that specific order. If you want only the exact results you need to be explicit: "in the glass" | "in a glass".

jiru commented 5 years ago

I think it’s a bug. It is not the expected behaviour of the "Relevance" sort.

ckjpn commented 4 years ago

BTW, the first search listed above now gets this error message.

Search error Invalid query. Please refer to the search documentation for more details.