denshoproject / ddr-public

Web UI for publishing DDR collections.
Other
1 stars 1 forks source link

Search filters (aggregations) broken #161

Closed gjost closed 3 years ago

gjost commented 4 years ago

On Friday I ran the usual apt-get update / upgrade in the colo. I upgraded the stage cluster and it looks like it broke search and possibly other parts of the DDR site. I'm taking stock of what's broken.

Looks like the Encyclopedia (encycstage) is OK.

ddrpublic search is throwing a KeyError looking for 'topics_ids' in ddrpublic/ui/search.py line 305:

Traceback (most recent call last):
  File "/opt/ddr-public/venv/ddrpublic/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/opt/ddr-public/venv/ddrpublic/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/opt/ddr-public/venv/ddrpublic/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/opt/ddr-public/ddrpublic/ui/decorators.py", line 24, in wrapper
    return f(*args, **kwargs)
  File "/opt/ddr-public/ddrpublic/ui/views/searching.py", line 115, in search_ui
    results = searcher.execute(limit, offset)
  File "/opt/ddr-public/ddrpublic/ui/search.py", line 724, in execute
    offset=offset,
  File "/opt/ddr-public/ddrpublic/ui/search.py", line 305, in __init__
    self.aggregations[field] = aggs[field_ids].buckets
  File "/opt/ddr-public/venv/ddrpublic/lib/python3.7/site-packages/elasticsearch_dsl/response/aggs.py", line 29, in __getitem__
    return super(BucketData, self).__getitem__(key)
  File "/opt/ddr-public/venv/ddrpublic/lib/python3.7/site-packages/elasticsearch_dsl/response/__init__.py", line 80, in __getitem__
    return agg.result(self._meta['search'], self._d_[attr_name])

ddrpublic-161-KeyError_at search.zip

GeoffFroh commented 4 years ago

ddrstage (stage) is running v5.0.4 pub: 6df729f49444453c72d16ed14f4177dcf3b5716d (HEAD -> master, origin/master, origin/HEAD) 2020-08-05 12:54:39 -0700

ddr (production) is running v5.0.3 pub: a3f2a796d6651cfe9e0966639df2eb040ecfd5b3 (HEAD -> master, origin/master, origin/HEAD) 2020-07-23 10:21:18 -0700

ES stage cluster is running v7.9.0 ES production cluster is running v7.8.0

GeoffFroh commented 4 years ago

rgstage (stage) and resourceguide (production) appear to be functioning correctly as well. Both are running v3.3.1 of encyc-rg

gjost commented 4 years ago

Note that the Elasticsearch production cluster has not yet been updated to 7.9.0.

gjost commented 4 years ago

For the time being apt upgrades for the elasticsearch package have been disabled in the production cluster (rosie,faker,butch) using sudo apt-mark hold elasticsearch. Holds may be listed using apt-mark showhold and removed using sudo apt-mark unhold PACKAGE.

It looks like there's a way to do this using Ansible but it uses dpkg which seems to have a different hold mechanism from apt's. For the time being I've added a note that breaks the elastic7.yml playbook just to make things obvious and record this state in the ansible-colo repository.

gjost commented 4 years ago

I found a backup deb deb10es (local development Elasticsearch server VM) that still had 7.8.1 and loaded it. It has the same issue with Search but NamesDB is fine. I guess Elasticsearch may not be the actual problem. However, I rolled my local ddr-public back to v5.0.3 (a3f2a79) and it has the same error.

I really want this to be Elasticsearch's fault and not my own code's, but even if it is we're not going back to 7.8.1. Sigh.

gjost commented 4 years ago

Tracebacks for ddrpublic/us/search.py:298.

In both production and stage the Elasticsearch query looks like this:

{
    "_source": [
        "id", "model", "links_html", "..."
    ],
    "aggs": {
        "facility": {
            "aggs": {
                "facility_ids": {"terms": {"field": "facility.id", "size": 1000}}
            },
            "nested": {"path": "facility"}
        },
        "format": {"terms": {"field": "format"}},
        "genre": {"terms": {"field": "genre"}},
        "rights": {"terms": {"field": "rights"}},
        "topics": {
            "aggs": {
                "topics_ids": {"terms": {"field": "topics.id", "size": 1000}}
            },
            "nested": {"path": "topics"}
        }
    },
    "query": {
        "query_string": {
            "allow_leading_wildcard": False,
            "analyze_wildcard": False,
            "default_operator": "AND",
            "fields": [
                "id", "model", "links_html", "..."
            ],
            "query": "minidoka"}
    }
}

The only difference is that v5.0.4 (stage) adds the record_lastmod field to query['_source'] and query['query_string']['fields'].

On the (working) production ddr.densho.org, pointing at the production Elasticsearch cluster, results.aggregations.to_dict() looks like this:

{
    "topics": {
        "doc_count": 1542,
        "topics_ids": {
            "buckets": [
                {"doc_count": 221, "key": "416"},
                {"doc_count": 183, "key": "173"},
                "..."
            ],
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0}
    },
    "facility": {
        "doc_count": 1352,
        "facility_ids": {
            "buckets": [
                {"doc_count": 1217, "key": "8"},
                {"doc_count": 31, "key": "55"},
                "..."
            ],
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0}
    },
    "format": {
        "buckets": [
            {"doc_count": 612, "key": "doc"},
            {"doc_count": 532, "key": "img"},
            "..."
        ],
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0
    },
    "genre": {
        "..."
    },
    "rights": {
        "..."
    }
}           

On the (broken) ddrstage.densho.org, also pointing at the production Elasticsearch cluster, results.aggregations.to_dict() looks like this:

{
    'topics': {
        'doc_count': 0
    },
    'facility': {
        'doc_count': 0
    },
    'format': {
        'buckets': [],
        'doc_count_error_upper_bound': 0,
        'sum_other_doc_count': 0
    },
    'genre': {
        'buckets': [],
        'doc_count_error_upper_bound': 0,
        'sum_other_doc_count': 0
    },
    'rights': {
        'buckets': [],
        'doc_count_error_upper_bound': 0,
        'sum_other_doc_count': 0
    }
}
gjost commented 4 years ago

Using git diff a3f2a79..6df729f to compare the two versions (a3f2a79 is v5.0.3 and 6df729f is v5.0.4), the only differences in the code (aside from edits to Makefile and other non-code files) are:

gjost commented 4 years ago

Looks like the elasticsearch-py is different between the two installs:

(ddrpublic) ansible@PRODUCTION:/opt/ddr-public$ pip freeze | grep elastic
elasticsearch==7.7.1
elasticsearch-dsl==7.2.1
(ddrpublic) ansible@STAGE:/opt/ddr-public$ pip freeze|grep elastic
elasticsearch==7.8.1
elasticsearch-dsl==7.2.1
gjost commented 4 years ago

I rolled elasticsearch-py on stage back to version 7.7.1, restarted stage ddrpublic, cleared the Redis cache, and rand the search again, and got the same error as before.

Just for laughs I will now roll ddrpublic back to v.5.0.3 on stage and see what happens...

gjost commented 4 years ago

Rolled stage ddrpublic back to v.5.0.3 and it works with the production Elasticsearch 7.8.0. Pointed stage at the stage Elasticsearch 7.9.0 cluster and... it works.

gjost commented 4 years ago

The more I look at this the more surprised I am that production ddrpublic search is working at all.

gjost commented 4 years ago

This is a Python shell distillation of what's happening in the search view:

from django.conf import settings
from ui import search
params = {'fulltext': 'minidoka'}
searcher = search.Searcher()
searcher.prepare(
    params=params,
    params_whitelist=search.SEARCH_PARAM_WHITELIST,
    search_models=search.SEARCH_MODELS,
    fields=search.SEARCH_INCLUDE_FIELDS,
    fields_nested=search.SEARCH_NESTED_FIELDS,
    fields_agg=search.SEARCH_AGG_FIELDS,
)
limit,offset = settings.RESULTS_PER_PAGE, 0
start = int(offset)
stop = (start + int(limit))
response = searcher.s[start:stop].execute()

ui.search prepares an elasticsearch_dsl.search.Search object from params in request. None of this code has changed since v5.0.3 with the exception of adding a few fields to those returned from the function. In the last line, searcher.s is the elasticsearch_dsl.search.Search object. The .execute() function is elasticsearch-dsl code not ddr-public code. Basically all this is leading up to running elasticsearch_dsl.search.Search.execute. This returns response which is an elasticsearch_dsl.response.Response object. The results.aggregations mentioned above is this elasticsearch_dsl.response.Response object (please forgive the naming - I guess a response from the server contains search results for the user?).

This works when running in the production ddrpublic virtualenv but does not work in the stage virtualenv, even though they both use the same version of elasticsearch-py and elasticsearch-dsl-py and they are both pointed at the same Elasticsearch cluster.

gjost commented 4 years ago

COLO

ddrpublic 5.0.3 5.0.3 5.0.3 5.0.4 5.0.4 5.0.4
elastic-py 7.7.1 7.7.1 7.7.1 7.7.1 7.9.1 7.9.1
elastic-dsl-py 7.2.1 7.2.1 7.2.1 7.2.1 7.2.1 7.2.1
elasticsearch 7.8.0 7.9.0 7.8.0 7.8.0 7.9.0 7.9.0
record_lastmod yes yes no yes
OK OK FAIL FAIL OK FAIL
gjost commented 4 years ago

Looks like it was the presence of record_lastmod in SEARCH_INCLUDED_FIELDS that broke things. WTF Elasticsearch?

gjost commented 4 years ago

Answer: SEARCH_INCLUDED_FIELDS is the list of fields on which fulltext search is performed, NOT (as I thought) the list of fields that are returned by the search. record_lastmod is a datetime so you can't perform a fulltext search on it [thanks @GeoffFroh!].

gjost commented 4 years ago

That was a hard one. I think it was a combination of factors:

All that prejudiced me to focus on Elasticsearch and libs as the problem instead of looking at my own code.

I need to go through my search code and look for try/except clauses. I might have hidden an error that would have surfaced this more easily

gjost commented 4 years ago

Fixed as of ddr-public commit fc0c958 for upcoming package ddrpublic-master_5.0.5~deb10.