Mike-Heneghan / ALISS

ALISS (A Local Information System for Scotland) is a service to help you find help and support close to you when you need it most.
https://aliss.org
0 stars 0 forks source link

Review elasticsearch for organisations. #109

Closed Mike-Heneghan closed 4 years ago

Mike-Heneghan commented 4 years ago

At the moment some users when searching organisation are making logical searches which are not returning the expected results.

The common theme appears to be extra words beyond the organisation name which are logical but as they do not directly match then name the organisation.

For example, if there was an organisation called "Scottish Optometrist Society" a user in Dundee might search "Scottish Optometrist Society Dundee " which would not return the result.

Changing Elasticsearch can be time-consuming so a preliminary search to see whether there are any clear options would be useful.

Mike-Heneghan commented 4 years ago

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-multi-match-query.html

Mike-Heneghan commented 4 years ago

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/common-options.html#fuzziness

Mike-Heneghan commented 4 years ago

The current method for filtering organisations by query.

def filter_organisations_by_query(queryset, q):
    queryset = queryset.query({
        "bool": {
            "must":[
                {
                    "multi_match": {
                        "query": q, "type": "most_fields",
                        "operator": "and",
                        "fields":["name^2", "description"],
                        "fuzziness": "AUTO:4,7"
                    }
                }],
            "should": [
                {
                    "multi_match": {
                        "query": q, "type": "most_fields",
                        "operator": "or",
                        "fields": ["name^2", "description^1.5"],
                    }
                }
            ]
        }
    })
    return queryset

I think the issue described could be caused by the "must" with operator: "and".

I think the reason users are getting no results is that every term i.e. "and" has to be matched "must".

Removing the operator from the must case returns results i.e. "Scottish Optometrist Society" returned from a search of "Scottish Optometrist Society Dundee".

Although the result of only changing the must statement is that less exact matches get the same score for example:

        exact_query = "Scottish Optometrist Society" (score: 29.667294)
        inexact_query = "Scottish Optometrist Society Glasgow Glasses" (score: 29.667294)
        undesired_query = "Scottish Ontology Society" (score: 14.432941)

Changing the "should" operator form "or" to "and" assigns a higher score when the terms appear together i.e.

        exact_query = "Scottish Optometrist Society" (score: 29.948296)
        inexact_query = "Scottish Optometrist Society Glasgow Glasses" (score: 14.974148)
        undesired_query = "Scottish Ontology Society" (score: 7.2789307)

The above would be more desirable as a closer match receives a higher score.

Mike-Heneghan commented 4 years ago

Current search result:

Screenshot 2019-09-30 at 14 58 46

Updated (more permissive) search result:

Screenshot 2019-09-30 at 14 56 06
Mike-Heneghan commented 4 years ago

The more permissive search may generate unexpected results. Need to consider test cases for checking the results of the new search.

Mike-Heneghan commented 4 years ago

Merged into master