NatLibFi / Annif

Annif is a multi-algorithm automated subject indexing tool for libraries, archives and museums.
https://annif.org
Other
197 stars 41 forks source link

Upgrade to Connexion3 #702

Closed juhoinkinen closed 5 months ago

juhoinkinen commented 1 year ago

Upgrades to Connexion 3, closes #689 and #698.

Connexion 3 introduces many changes compared to Connexion 2, see the documentation here.

Most importantly, after the upgrade to Connexion 3, running Annif with Gunicorn requires the use Uvicorn workers; the workers can be set using the --worker-class/-k option of Gunicorn:

gunicorn --worker-class uvicorn.workers.UvicornWorker "annif:create_app()"

In Dockerfile there is the enviroment variable GUNICORN_CMD_ARGS="--worker-class uvicorn.workers.UvicornWorker", which gives Gunicorn the option by default, so Docker image users do not have to worry about this change.


The original PR discussion below beginning from first draft.

This seems to be surprisingly difficult. Connexion 3 documentation states that there are two modes of running/usage of Connexion:

  1. Using stand-alone Connexion - use the provided Flask app via connexion.FlaskApp
  2. Using Connexion with existing ASGI or WSGI frameworks - use the "middleware" connexion.ConnexionMiddleware to wrap an existing Flask app

Previously Annif has been using the mode 1, and tried to continue with that. But now

Connexion 3.0 needs to be run using an ASGI server instead of a WSGI server. While any ASGI server should work, connexion comes with uvicorn as an extra

And the app needs to be started with

 uvicorn --factory annif:create_app

The command

 annif run

does not know about uvicorn but apparently tries to use plain Flask, and the app won't start.

There are also smaller breaking changes, that needed to be addressed:

sonarcloud[bot] commented 1 year ago

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

codecov[bot] commented 1 year ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.64%. Comparing base (2e5e987) to head (e1e5d5a).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #702 +/- ## ========================================== - Coverage 99.65% 99.64% -0.02% ========================================== Files 89 89 Lines 6423 6443 +20 ========================================== + Hits 6401 6420 +19 - Misses 22 23 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

juhoinkinen commented 9 months ago

I added added annif run CLI command for starting uvicorn dev server as a custom Click command and removed the default run ,shell and routes Flask commands. The cli object used by annif command is still an instance of FlaskGroup, but maybe that could be turned into a custom Click command group. I don't know if this would then uncouple Flask fully from Annif.

Note that the default port for uvicorn is 8000, whereas for Flask it is 5000.

Also the tests/test_openapi.py do not pass due to an error:

AttributeError: module 'anyio' has no attribute 'start_blocking_portal'

but this seems to be some version mismatch of starlette.

And some tests/test_cli.py need to be adjusted.

All in all, maybe there could be an alternative to Connexion, which could allow still using Flask.

sonarcloud[bot] commented 9 months ago

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

7 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

juhoinkinen commented 8 months ago

Also the tests/test_openapi.py do not pass due to an error:

AttributeError: module 'anyio' has no attribute 'start_blocking_portal'

but this seems to be some version mismatch of starlette.

Starlette-testclient has not pinned (or declared at all?) its anyio dependency, and anyio 4.* do not work with it: https://github.com/Kludex/starlette-testclient/issues/5

Schemathesis v3.2.3 takes this into account and requires anyio <4, but for some reason anyio v4.2.0 gets installed.

A workaround is to pin anyio<4.* as a direct dependency.

juhoinkinen commented 8 months ago

When setting CORS in annif/__init__.py with

cxapp.add_middleware(
     CORSMiddleware,
    position=MiddlewarePosition.BEFORE_EXCEPTION,
    allow_origins=["*"],
    allow_methods=["*"],
)

CORS is not working for unit tests:

tests/test_openapi.py:43: AssertionError
...
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects/{project_id}] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest-batch] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/learn] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_list_projects - assert 0

However, when looking at the response headers in the Firefox network tab, the 'access-control-allow-origin' header is there (at least for /suggest request), so there might be an issue due to starlette-testclient.

juhoinkinen commented 7 months ago

Also the tests/test_openapi.py do not pass due to an error:

AttributeError: module 'anyio' has no attribute 'start_blocking_portal'

but this seems to be some version mismatch of starlette.

Starlette-testclient has not pinned (or declared at all?) its anyio dependency, and anyio 4.* do not work with it: Kludex/starlette-testclient#5

Schemathesis v3.2.3 takes this into account and requires anyio <4, but for some reason anyio v4.2.0 gets installed.

A workaround is to pin anyio<4.* as a direct dependency.

Anyio pinning is not needed anymore, when doing these updates

juhoinkinen commented 7 months ago

When setting CORS in annif/__init__.py with

cxapp.add_middleware(
     CORSMiddleware,
    position=MiddlewarePosition.BEFORE_EXCEPTION,
    allow_origins=["*"],
    allow_methods=["*"],
)

CORS is not working for unit tests:

tests/test_openapi.py:43: AssertionError
...
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects/{project_id}] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest-batch] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/learn] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_list_projects - assert 0

However, when looking at the response headers in the Firefox network tab, the 'access-control-allow-origin' header is there (at least for /suggest request), so there might be an issue due to starlette-testclient.

Also curl output shows the CORS header to be sent (when using H "Origin: http://example.com"):

curl -H "Origin: http://example.com" -X 'POST'   'http://127.0.0.1:8000/v1/projects/yake-fi/suggest'   -H 'accept: application/json'   -H 'Content-Type: application/x-www-form-urlencoded'   -d 'text=kissa&limit=10&threshold=0&language=' -v
juhoinkinen commented 6 months ago

TODO: Having jsonschema as a direct dependency may not be required after this PR, if the annif/openapi/validation.py will not be using it anymore.

juhoinkinen commented 6 months ago

When setting CORS in annif/__init__.py with

cxapp.add_middleware(
     CORSMiddleware,
    position=MiddlewarePosition.BEFORE_EXCEPTION,
    allow_origins=["*"],
    allow_methods=["*"],
)

CORS is not working for unit tests:

tests/test_openapi.py:43: AssertionError
...
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[GET /v1/projects/{project_id}] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/suggest-batch] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_fuzzy[POST /v1/projects/{project_id}/learn] - KeyError: 'access-control-allow-origin'
FAILED tests/test_openapi.py::test_openapi_list_projects - assert 0

However, when looking at the response headers in the Firefox network tab, the 'access-control-allow-origin' header is there (at least for /suggest request), so there might be an issue due to starlette-testclient.

Also curl output shows the CORS header to be sent (when using H "Origin: http://example.com"):

curl -H "Origin: http://example.com" -X 'POST'   'http://127.0.0.1:8000/v1/projects/yake-fi/suggest'   -H 'accept: application/json'   -H 'Content-Type: application/x-www-form-urlencoded'   -d 'text=kissa&limit=10&threshold=0&language=' -v

The CORS testing problem was resolved by removing CORS check from the fuzzying test and (re)creating a dedicated test for it, which adds Origin header to the request:

app_client.headers = {"Origin": "http://somedomain.com"}

But now there is a (new?) problem: the test_openapi_fuzzy discovered that %0A, i.e. encoded newline, in the URL parameters crashes the application. This does not occur with Connexion 2.

Edit: The test cases with %0A in path parameter are now skipped over to keep tests working.

juhoinkinen commented 5 months ago

I tested this PR and here are some findings:

CLI operation

  • The Flask-related commands routes and shell have been dropped. This is good because they're irrelevant for Annif anyway. But the -e, --env-file and -A, --app options that seem to be Flask-specific are still there - could we easily get rid of those too?

Yes, done.

  • I timed a few commands (help, list-projects, suggest) and they don't seem to take longer to run than they used to - some are maybe even a little faster than before. Excellent!

annif run command

  • The console output now looks a bit different than it used to, due to the switch to uvicorn. But I think the new output looks good.

  • Annif now listens on port 8000, when it used to listen on port 5000. I think it would be better to keep the old port, because it's documented for example in the Annif tutorial Web UI and REST API exercises. Simply defining a default value of 5000 for --port should work, right?

Yes, done.

Swagger-UI

  • The category view in /v1/ui now has a different order - "Learning from feedback" comes last. It's better this way!

  • Method parameters are also in a different order - not worse than before, maybe even a bit better. The layout for the "Send empty value" checkbox seems a bit broken, though I'm unsure if we can do anything about it. Here is an example that shows old (main) on the left, new (PR branch) on the right:

image

REST API

  • CORS now works differently than before. I'm not sure if this is bad, just something to note (maybe put in the release notes?).
    • Before: If there's an Origin header in the request, send Access-Control-Allow-Origin: <Origin header value>, otherwise send Access-Control-Allow-Origin: *
    • After: If there's an Origin header in the request, send Access-Control-Allow-Origin: *, otherwise send no such header.
  • The URL /projects/ used to give a 404 response, now it redirects to the correct URL /projects. Again, not necessarily a problem, just different, and perhaps worth putting into the release notes.

~TODO in Relase notes.~ Edit: Added in Relase notes.

  • Successful JSON responses are pretty-printed, but error responses in Problem JSON are not. Previously both kinds of JSON responses were pretty-printed for easier readability. I think readable JSON would be good especially for the error messages, since they are likely to be seen by humans at some point.

Yes, but seems that this cannot be configured, see an old issue: https://github.com/spec-first/connexion/pull/313

Other

  • SonarCloud has a few complaints - are these real problems?

I would say no, I silenced them.

  • CodeClimate also complains of complexity, but it seems like the affected code was already complex, so not really a regression

Yes, and the code in question is to override a function of Connexion, so it is not meaningful to simplify it. I silenced the complain as "wontfix".

  • Codecov complains about L25 in validation.py not being covered by tests, would it be easy to test this? Or add a notation that excludes it from test coverage?

I tried to add test for this line, but it is not detected by CodeCov. I added # pragma: no cover annotation for the line to exclude it.

juhoinkinen commented 5 months ago

From now on when running Annif with Gunicorn one must set the workers to uvicorn with the -k option, that is to use command

gunicorn -k uvicorn.workers.UvicornWorker "annif:create_app()"

This needs to be clearly communicated and updated to docker-compose.yml.

sonarcloud[bot] commented 5 months ago

Quality Gate Passed Quality Gate passed

Issues
0 New issues
1 Accepted issue

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud