buda-base / public-digital-library

http://library.bdrc.io
4 stars 6 forks source link

connect search to API #886

Closed eroux closed 2 months ago

eroux commented 3 months ago

Roope coded a small API in python to improve some of the results. The way to call the API is to send a regular elasticsearch json to https://autosuggest.bdrc.io/search (no ES credentials needed), and it returns a usual elasticsearch results json. The main trick is to have the search string in a bdrc-query object, as exemplified in the README

berger-n commented 3 months ago

so, I've been able to test it locally:

image


there are issues with reusing the data sent to the standard API though and I'm not sure where to go from here @eroux @roopeux :

eroux commented 3 months ago

oh I see, thanks! What we need to do in that case is

"query": {
   "function_score": {
     "script_score": {
       "script": {
         "id": "bdrc-score"
       }
     },
     "query": {
       "bool": {
         "filter": [
           {
             "bool": {
               "should": [
                 {
                   "range": {
                     "etext_quality": {
                       "gte": "3.99",
                       "lte": "4.01"
                     }
                   }
                 }
               ]
             }
           }
         ],
         "bdrc-query": "spyod 'jug",
                (...)

I think... @roopeux can you adjust and make that work in the Python code? Generally speaking I think the python code should just look for "bdrc-query" as a key anywhere, not just in a specific json path

eroux commented 3 months ago

@berger-n it should work now, see https://github.com/buda-base/autocomplete-prototype/commit/71205d450f702b10127137ccfd30b73190e5da0d

berger-n commented 3 months ago

ok thanks! getting this error:

{"error":{"reason":"[1:1493] [bool] unknown field [dis_max]","root_cause":[{"reason":"[1:1493] [bool] unknown field [dis_max]","type":"x_content_parse_exception"}],"type":"x_content_parse_exception"},"status":400}

with the following query:

{
    "function_score": {
        "script_score": {
            "script": {
                "id": "bdrc-score"
            }
        },
        "query": {
            "bool": {
                "filter": [],
                "bdrc-query": "spyod 'jug"
            }
        }
    }
}

here's the full curl request if needed:

curl 'https://autocomplete.bdrc.io/search' \
  -H 'Accept: */*' \
  -H 'Accept-Language: fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6,zh;q=0.5' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json' \
  -H 'Origin: http://localhost:3000' \
  -H 'Pragma: no-cache' \
  -H 'Referer: http://localhost:3000/' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: cross-site' \
  -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36' \
  -H 'sec-ch-ua: "Not/A)Brand";v="8", "Chromium";v="126", "Google Chrome";v="126"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "Linux"' \
  --data-raw $'{"from":0,"size":20,"aggs":{"associatedCentury":{"terms":{"field":"associatedCentury","size":20}},"associatedTradition":{"terms":{"field":"associatedTradition","size":20}},"author":{"terms":{"field":"author","size":20}},"etext_access":{"terms":{"field":"etext_access","size":20}},"etext_quality":{"range":{"field":"etext_quality","ranges":[{"from":0,"to":0.8},{"from":0.8,"to":0.95},{"from":0.95,"to":1.01},{"from":1.99,"to":2.01},{"from":2.99,"to":3.01},{"from":3.99,"to":4.01}]}},"inCollection":{"terms":{"field":"inCollection","size":20}},"language":{"terms":{"field":"language","size":20}},"personGender":{"terms":{"field":"personGender","size":20}},"printMethod":{"terms":{"field":"printMethod","size":20}},"scans_access":{"terms":{"field":"scans_access","size":20}},"script":{"terms":{"field":"script","size":20}},"translator":{"terms":{"field":"translator","size":20}},"type":{"terms":{"field":"type","size":20}},"workGenre":{"terms":{"field":"workGenre","size":20}},"workIsAbout":{"terms":{"field":"workIsAbout","size":20}}},"highlight":{"fields":{"prefLabel_bo_x_ewts":{},"altLabel_bo_x_ewts":{},"prefLabel_en":{},"altLabel_en":{},"seriesName_bo_x_ewts":{},"seriesName_en":{},"content_en":{},"comment_bo_x_ewts":{},"comment_en":{}}},"query":{"function_score":{"script_score":{"script":{"id":"bdrc-score"}},"query":{"bool":{"filter":[],"bdrc-query":"spyod \'jug"}}}}}'
roopeux commented 3 months ago

@eroux the bug is from your last push. I'll replace it with mine, which seems to work.

eroux commented 3 months ago

ok

eroux commented 3 months ago

@berger-n the change is deployed, can you give it a try?

roopeux commented 3 months ago

@berger-n in the upcoming API, do not change anything, do not add 'bdrc-query', just send it raw to the API. The API will find the query.

I sent the API to @eroux because I could not push it.