ThreeSixtyGiving / grantnav

This is a web based search tool for data in the 360 giving data format.
http://grantnav.threesixtygiving.org/
Other
9 stars 5 forks source link

Steming is giving too wide results #508

Open morchickit opened 5 years ago

morchickit commented 5 years ago

I queried GrantNav for "Clinical" and got results with the word clinic. I DID NOT want to get the word clinic at all... how can we make sure it doesn't make it too wide?

robredpath commented 5 years ago

The options are, I think:

Something we can consider as part of future GN work, for sure.

mariongalley commented 3 years ago

https://trello.com/c/tFLVz8eh/213-stemming-can-cause-confusing-or-inaccurate-results-in-grantnav

mariongalley commented 2 years ago

Issue description:

When users are trying to make a specific search, elasticsearch will do fuzzy matching to look for similar words e.g. searching for "community organiser" will also return results containing "community organisation". This happens even when the user uses quotes, and when they use the AND operator, such as "community AND organiser".

This is an issue as "community organiser" and "community organisation" are very different things.

Possible solutions:

It's clear how the world will be different when the work is completed

Users will have more control over whether GrantNav behaves more like Google i.e. "give me everything that might be related to the topic, no matter how tenuous" or more like a specialist search engine i.e. "give me all the items that exactly match my criteria"

mariongalley commented 1 year ago

@michaelwood In my latest comment I described a few possible solutions, all of which essentially boil down to having the option to turn off stemming/fuzzy matching. Is it possible to do this in GrantNav?

michaelwood commented 1 year ago

I think we could have an option to turn off the fuzzy matching. This would involve us having an internal field which isn't analysed and could be switched to via the front end.

That said I'd like to see the quotes method working, as you said this mirrors other known search paradigms and is expected behaviour. This may have been a regression when we changed version of ElasticSearch as I am fairly sure this used to work we may need now need to set the parameter https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness based on whether there are quotes in the search string.