geopython / pygeoapi

pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license.
https://pygeoapi.io
MIT License
460 stars 250 forks source link

update API object when configuration changes #1636

Open tomkralidis opened 2 months ago

tomkralidis commented 2 months ago

Description Ensure that changes to the configuration are visible without requiring a service restart.

Steps to Reproduce

Expected behavior pygeoapi always provides service based on the latest configuration changes

Screenshots/Tracebacks

Environment

Additional context cc @totycro

totycro commented 2 months ago

Some considerations on this:

The config is read in the global scope in e.g. flask_app.py, so every worker processes has its own copy.

If there's only one worker process (with possibly threads), it would be possible to change the config in place by doing something like self.config.update(data) when receiving update requests. However the docker entrypoint starts gunicorn with 4 processes, so only one of them would effectively receive the update in this case.

To update all workers, a graceful restart is necessary and in gunicorn can be done by using kill -HUP masterpid. To get the masterpid, we can run gunicorn with --pid some-file-path.pid to make it write its pid. This approach seems very feasible, but relies on the pid file setup. Note that other servers like uWSGI also support reloading via the HUP signal, so it's probably a reasonable to assume support for this. It's however not supported with the dev flask server.

Other possibilities are to read the config on reach request, which would be really slow. We could also cache it for e.g. 1 minute in each worker or even read it from some hot cache like redis (probably overkill).

webb-ben commented 2 months ago

You can use the gunicon reload function to perform this as described here: https://docs.pygeoapi.io/en/latest/admin-api.html#pygeoapi-hot-reloading-in-gunicorn. The docker example for pygeoapi admin implements this strategy in the entrypoint, as does the entrypoint in wis2box-api.

There is a security concern for users who do not want to update their configuration which is why it is not in the default Docker image of pygeoapi. If it is their wish to not update the configuration, i.e. the Admin API is not enabled, I do not think pygeoapi should hot-reload changes made to the configuration.

I am partial to some solution that exists below the (Flask, Starlette, Django) framework to avoid all these potential deployment variations.

matthesrieke commented 2 months ago

Maybe it could be an option to consider a database-driven configuration (at least for the resources)? I described something similar a while ago in a feature request (https://github.com/geopython/pygeoapi/issues/1351). Restarting/hot-reloading seems like a workaround to me. I do not see an issue in performance when introducing a lightweight database. This would solve the thread issues as well.

webb-ben commented 2 months ago

Another note, anytime the configuration gets updated, the Open API document must also be recreated and read into memory. Considerations need to be taken for both PYGEOAPI_CONFIG and PYGEOAPI_OPENAPI.