NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
195 stars 41 forks source link

Add REST API method batch-suggest #664

Closed juhoinkinen closed 1 year ago

juhoinkinen commented 1 year ago

Adds a new /v1/projects/{project_id}/suggest-batch REST API method. Based on the branch of PR #663, implements the REST API part of #579.

The method accepts in json the documents (max. 32) with the optional document_id field:

{
  "documents": [
    {
      "document_id": "doc-1234",
      "text": "A quick brown fox jumped over the lazy dog."
    }
  ]
}

The limit, threshold and language parameters are optional as for the regular suggest method and can be given as URL query parameters:

POST /projects/yso-tfidf-en/suggest?limit=10&threshold=0.2

An example response is:

[
  {
    "results": [
      {
        "label": "Archaeology",
        "notation": "42.42",
        "score": 0.85,
        "uri": "http://example.org/subject1"
      }
    ],
    "document_id": "doc-1234"
  }
]

The document_id is null in the response if the document in the request does not have one. It is similar to the external_id field of MonkeyLearn classifier and to the index AWS Comprehend BatchDetectKeyPhrases.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (38ec228) 99.57% compared to head (ca158d8) 99.58%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #664 +/- ## ======================================== Coverage 99.57% 99.58% ======================================== Files 88 89 +1 Lines 6146 6268 +122 ======================================== + Hits 6120 6242 +122 Misses 26 26 ``` | [Impacted Files](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi) | Coverage Δ | | |---|---|---| | [annif/\_\_init\_\_.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvX19pbml0X18ucHk=) | `90.32% <100.00%> (+0.66%)` | :arrow_up: | | [annif/openapi/validation.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvb3BlbmFwaS92YWxpZGF0aW9uLnB5) | `100.00% <100.00%> (ø)` | | | [annif/rest.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-YW5uaWYvcmVzdC5weQ==) | `97.53% <100.00%> (+0.97%)` | :arrow_up: | | [tests/conftest.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-dGVzdHMvY29uZnRlc3QucHk=) | `100.00% <100.00%> (ø)` | | | [tests/test\_openapi.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-dGVzdHMvdGVzdF9vcGVuYXBpLnB5) | `100.00% <100.00%> (ø)` | | | [tests/test\_rest.py](https://codecov.io/gh/NatLibFi/Annif/pull/664?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi#diff-dGVzdHMvdGVzdF9yZXN0LnB5) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=NatLibFi)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

osma commented 1 year ago

Would it make sense to limit the number of documents in a single REST call to the minibatch size (currently 32)? After all, there would be little benefit (except maybe avoiding some HTTP overhead) in processing more than one minibatch on the backend side.

juhoinkinen commented 1 year ago

I tried to reduce duplication in annif.yaml as much as possible. Unfortunately I did not find a way to define parameters for suggestions in only place, because for /suggest they are used in request body, whereas for /suggest-batch in query parameters.

Edit: Defining the suggest parameters in components.parameters works otherwise, but then the types of the variables or their default values do not show in the OpenAPI/Swagger online documentation.

juhoinkinen commented 1 year ago

I reverted two refactoring commits. I think https://github.com/NatLibFi/Annif/commit/067fec38223adb1c1e957fe480ad9e6878e5147c should not have been problematic, but I run openapi-diff tool, and it complained about POST /projects/{project_id}/suggest:

==========================================================================
==                            API CHANGE LOG                            ==
==========================================================================
                              Annif REST API                              
--------------------------------------------------------------------------
--                              What's New                              --
--------------------------------------------------------------------------
- POST   /projects/{project_id}/suggest-batch

--------------------------------------------------------------------------
--                            What's Changed                            --
--------------------------------------------------------------------------
- POST   /projects/{project_id}/learn
  Request:
        - Changed application/json
          Schema: Backward compatible
- POST   /projects/{project_id}/suggest
  Request:
        - Changed application/x-www-form-urlencoded
          Schema: Broken compatibility
          Changed property type:  (object -> object)
--------------------------------------------------------------------------
--                                Result                                --
--------------------------------------------------------------------------
                 API changes broke backward compatibility                 
--------------------------------------------------------------------------

Now after the reverts the tool still sees changed application/json in /projects/{project_id}/learn, but I think it is wrong, at least I cannot see a difference in there, and the tool anyway says that the change is backward compatible.

Edit: Okey, openapi-diff probably noticed the change in the description of IndexedDocument schema.

juhoinkinen commented 1 year ago

682 introduced Schemathesis to automate the testing of the actual API, which was previously done with manually written tests. Manually written tests allowed to test specific things, e.g. which error code arises for which (malformed) request. For example now there is the limit of 32 documents for /suggest-batch, but no test for it.

Schemathesis uses the examples from OpenAPI specification and some random inputs in path and query parameters. The requests to /v1/projects/<proj-id>/suggest-batch can be seen by running a local Annif server and Schemathesis from command line:

st run annif/openapi/annif.yaml -E suggest-batch$ --base-url http://127.0.0.1:5000/v1
Long list of request logs INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/dummy-fi/suggest-batch HTTP/1.1" 200 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/%3F/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/󠀟%40/suggest-batch?language=Æ𲂇%7B HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/𫞠/suggest-batch?limit=9602 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/%0F©𭵡õ%3Fí%1F%11÷¢/suggest-batch?language=%1C&threshold=6.103515625e-05 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/󰾽2kr7C/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/󂝮/suggest-batch?limit=8389248&language=Ȁ&threshold=0.5008241237401309 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/ÄÒ%7D/suggest-batch?limit=25733 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/󎣐%17/suggest-batch?language=&limit=79&threshold=0.2344360919705793 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/Î/suggest-batch?limit=25503 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/JÖ𪾻¥򻠊/suggest-batch?threshold=1.1754943508222875e-38 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/0/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/鼠ä/suggest-batch?language=øx&threshold=1.1754943508222875e-38&limit=21298 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/y/suggest-batch?limit=3844&threshold=0.25238021155102036&language=² HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/󶓅𣫛Y𺰏%60𕖆%5EÎ%3F/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/S򷝦/suggest-batch?threshold=1.0&limit=43&language=ç¯%05'􏃲aÿ򧐅'%3A񷡍n HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:20] "POST /v1/projects/򀱟/suggest-batch?threshold=1e-05 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:21] "POST /v1/projects/E/suggest-batch?limit=127&language=%3C񢖬򵦫w񹬠&threshold=0.99999 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:21] "POST /v1/projects/¡򿹤HËýÄüÕ/suggest-batch?limit=3512496318697642229&threshold=0.23034777681204327&language= HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:21] "POST /v1/projects/%04õ/suggest-batch?threshold=1e-05&limit=10162&language= HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:21] "POST /v1/projects/𽺾/suggest-batch?threshold=1.192092896e-07 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:21] "POST /v1/projects/¬ÿ÷򐁖DZ𤂷_¿𦥋7ñ/suggest-batch?threshold=1.1125369292536007e-308&language=Ù»îÛÂ&limit=29145 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/𹳂ÑÒ𑺾&threshold=1.175494351e-38&limit=21033 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/oèÁ/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/㼀/suggest-batch?threshold=0.0 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/􉑟/sugge%1D/suggest-batch HTTP/1.1" 404 -0&language='TTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/𥯆%5CÏ1򺊒%7B/suggest-batch?limit=15&language=󓝮 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/񿄌â%7B¥oÇ_%5D/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/𧂰jÀ9򋇺0/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/Ý%0FVv°P/suggest-batch?limit=118 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:22] "POST /v1/projects/a%05F/suggest-batch?language=ª½M§&threshold=1e-05 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/%5D%098+Ó¡Wò/suggest-batch?limit=13346 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/Ý°%5Dò%1B/suggest-batch?language=򤴩nû£񈾌ÖÍ򯥦&limit=3513076579650466351 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/򄝛%05󡹻uggest-batch?threshold=0.9999999999999999&language=򹡺© HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/󅽉𼖿õ&limit=22244 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/%03/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:23] "POST /v1/projects/%5B%0AT/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/򑼿Ã2/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/²B%07%5C/suggest-batch?threshold=2.2250738585072014e-308&limit=105&language=W HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/𜹥/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/𜹥/suggest-batch?limit=847249536&threshold=2.225073858507203e-309&language=Ý%0Ca𦅹a񯔎󮖛 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/°Ò𑫽»%1D񺚋󏍁/suggest-batch?language=%1Dò򥔃󋉩&limit=120 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:24] "POST /v1/projects/%18ú/suggest-batch?limit=61 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:25] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:25] "POST /v1/projects/󣥐򂖃ζ¸/suggest-batch?language=ñ𔩆%11x©𜈈&limit=22711&threshold=1.175494351e-38 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:25] "POST /v1/projects/񄺮R/suggest-batch?limit=107&language=񹸴󗩙&threshold=1.0 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:25] "POST /v1/projects/%60¥􋱐/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:25] "POST /v1/projects/Q/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/%18/suggest-batch HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:26] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:27] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:28] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:28] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:28] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:28] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:29] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:30] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 - INFO:werkzeug:127.0.0.1 - - [20/Mar/2023 16:06:31] "POST /v1/projects/򲦍Æ6򘮔/suggest-batch?limit=10787 HTTP/1.1" 404 -

But there is no request with 400 code, which is the code for language_not_supported_error or too many documents.

juhoinkinen commented 1 year ago

Finally I managed to drop the unnecessary merge and revert commits. I recreated the PR branch from the current main and then cherry-picked the good commits from a backup PR branch. I don't know why rebasing did not work: there ended up also all commits made to main not matter what I tried. Changing base branch back and forth did not help as usually.

But now there is a problem installing just released version of a dependency in GH Actions (but not on my laptop):

Unable to find installation candidates for libclang (16.0.0)
juhoinkinen commented 1 year ago

But now there is a problem installing just released version of a dependency in GH Actions (but not on my laptop):

Unable to find installation candidates for libclang (16.0.0)

This seemed to be caused by issue https://github.com/sighingnow/libclang/issues/46, which got fixed.

juhoinkinen commented 1 year ago
  1. I suggested a little change to the wording of the method summary

Done.

  1. I verified that the method fails if given more than 32 documents, as it should. But the error message is a bit confusing:
{
  "detail": "[{'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'Olipa kerran'}, {'text': 'plaa'}] is too long - 'documents'",
  "status": 400,
  "title": "Bad Request",
  "type": "about:blank"
}

Basically the "detail" field includes the whole request, which could be extremely long; and it ends with "is too long - 'documents'" which is not that helpful. Is this something we could change or is this coming directly from Connexion so we can't do anything about it? I would like to see a more helpful message, which wouldn't include the whole request body, just a message stating that there were too many documents.

I added CustomRequestBodyValidator in annif/openapi/validation.py module. It is a child class of the connexion RequestBodyValidator, and it overrides the default validate_schema() method to modify the message in the "detail" field to only validation error: too many items - 'documents'.

This seems to be the recommended way to modify the validation, found it via https://github.com/spec-first/connexion/issues/558.

  1. Related to the above, can we write (for example using schemathesis) unit tests that verify that the API accepts 32 documents, but doesn't accept 33 documents? I'm worried that if we change the implementation later, the limit of 32 will be dropped and it could lead to problems on the backend side.

I added tests for cases 32 and 33 documents, and restored the manually written Swagger/OpenAPI tests that I removed in #682. These test do not rely on Schemathesis but on the app_client fixture, and this allows to check which error code is given for which error.

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication