buda-base / public-digital-library

http://library.bdrc.io
5 stars 6 forks source link

Wrong etext numbers in filters #953

Open eroux opened 3 weeks ago

eroux commented 3 weeks ago

For some reason all the "terms" filters work fine in the scope defined by the keywords, but "range" searches everything, giving always the same very high values for "etext_quality" regardless of the keywords.

"range" could be replaced by "filter" and it would work, but can searchkit produce that? Or if I modify the json from "range" to "filter" in the API, can FE use the results that look a bit different?

    "etext_access": {
      "terms": {
        "field": "etext_access",
        "size": 10000
      }
    },
    "etext_quality": {
      "range": {
        "field": "etext_quality",
        "ranges": [
          {
            "from": 0,
            "to": 0.8
          },

Image

eroux commented 3 weeks ago

@berger-n what do you think?

berger-n commented 3 weeks ago

I think if you can replace all these ranges by dedicated filter categories, like very_low to very_high or anything, it should be quite straightforward for me to handle

  "etext_quality": [
    { "from":0,    "to":0.8  },
    { "from":0.8,  "to":0.95 },
    { "from":0.95, "to":1.01 },
    { "from":1.99, "to":2.01 },
    { "from":2.99, "to":3.01 },
    { "from":3.99, "to":4.01 }
  ],
roopeux commented 3 weeks ago

@berger-n, the API now ignores etext_quality from searchkit and replaces it so that the result will be


        "etext_quality": {
          "buckets": {
            "range_0.8_to_0.95": {
              "doc_count": 433
            },
            "range_0.95_to_1.01": {
              "doc_count": 1245
            },
            "range_0_to_0.8": {
              "doc_count": 719
            },
            "range_1.99_to_2.01": {
              "doc_count": 1130
            },
            "range_2.99_to_3.01": {
              "doc_count": 279
            },
            "range_3.99_to_4.01": {
              "doc_count": 5
            }
          }
        },```
berger-n commented 3 weeks ago

thanks @roopeux! displaying results is working again: https://library-dev.bdrc.io/osearch/search?q=par%20khang (provided that the query results are patched in the client before being processed, see example on discord)

image

berger-n commented 3 weeks ago

just to keep track of it, the selection of a value to filter along doesn't work though: https://library-dev.bdrc.io/osearch/search?q=par%20khang&etext_quality%5B0%5D=6 (probably because here 6 isn't a proper value for etext_quality; but range_2.99_to_3.01 does not seem to be an allowed string for a value and makes server fail, as stated earlier on discord)

roopeux commented 3 weeks ago

@berger-n would it be better if I make the whole thing in API, transparent for searchkit? Searchkit would send and get the json exactly in the same format as before.

berger-n commented 3 weeks ago

@roopeux I guess in order to select the correct value for the filter you'll have to implement it in API yes

roopeux commented 3 weeks ago

Yes that is the thing I forgot.

berger-n commented 1 week ago

seems to be still broken when not using a keyword:

roopeux commented 1 week ago

I cannot test this locally but the latest push of search_bdrc.py might possibly fix this.