cern-sis / issues-inspire

0 stars 0 forks source link

Facets don't work #529

Closed karolina-siemieniuk-morawska closed 2 months ago

karolina-siemieniuk-morawska commented 3 months ago

Facets are empty, see i.e. https://backoffice.dev.inspirebeta.net/api/workflows/search/

    "facets": {
        "_filter_workflow_type": {
            "doc_count": 27,
            "workflow_type": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": []
            }
        },
        "_filter_status": {
            "doc_count": 27,
            "status": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": []
            }
        }
    }

Connected to: https://github.com/cern-sis/issues-inspire/issues/484

DonHaul commented 3 months ago

By doing some queries in the open search dev console, I've discovered that removing the .keyword makes the query return values.

GET inspire-backoffice-backend-dev-workflows/_search
{
  "size": 0, 
  "aggs": {
    "workflow_type_facets": {
      "terms": {
        "field": "workflow_type", #.keyword
        "size": 10
      }
    }
  }
}

In principle, removing this same .keyword from https://github.com/inspirehep/backoffice/blob/main/backoffice/backoffice/workflows/api/views.py#L287 for the workflow_type and status facets should make it work.

However this seems to go against what is specified in the documentations. and removing this .keyword , makes this /api/workflows/search stop working in my local setup with error:

RequestError at /api/workflows/search/
RequestError(400, 'search_phase_execution_exception', 'Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [status] in order to load field data by uninverting the inverted index. Note that this can use significant memory.')

Next steps: verify version used in local setup and in qa - https://os-inspire-qa-os.cern.ch

DonHaul commented 3 months ago

Local version taken from registry.cern.ch is version 2.9.0 Version used in cluster is 2.13.0. My guess is in the new version its now okay to omit .keyword Next steps:

DonHaul commented 3 months ago

Apparently there's something more. Facets now work on qa but not locally, the same error as referred above is still ocurring

DonHaul commented 3 months ago

Rebuilding the index locally seemed to have fixed the issue.

Using GET inspire-backoffice-backend-dev-workflows/_mapping/field/workflow_type we would get in QA

"workflow_type": {
          "type": "keyword"
        }

while locally:

"workflow_type": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 256
                }
              }
            },

Runnin python manage.py opensearch index rebuild makes the local version be the same as qa. However tests will still fail in the github workflows as the command is not run..

Next Steps: