geopython / pygeoapi

pygeoapi is a Python server implementation of the OGC API suite of standards. The project emerged as part of the next generation OGC API efforts in 2018 and provides the capability for organizations to deploy a RESTful OGC API endpoint using OpenAPI, GeoJSON, and HTML. pygeoapi is open source and released under an MIT license.
https://pygeoapi.io
MIT License
488 stars 261 forks source link

Replace flask and django dev servers with gunicorn #1400

Closed ricardogsilva closed 6 months ago

ricardogsilva commented 10 months ago

gunicorn is a pretty robust Python server that can be used to run both wsgi and asgi (with the uvicorn worker) web apps. In addition to being production-ready, it also contains features that make it a nice tool for using in development mode, such as:

Using gunicorn as both the dev and production web server, and regardless of the web framework being flask, starlette or django would tighten up the pygeoapi codebase and provide a more uniform running interface to users. We'd be able to recommend something like this:

export PYGEOAPI_CONFIG=my-config.yml
export PYGEOAPI_OPENAPI=my-pygeoapi-openapi.yml

# for production
pygeoapi serve --flask

# for development
pygeoapi serve --flask --debug

Both commands would run pygeoapi with gunicorn, and pygeoapi serve --debug would take care of turning on the needed gunicorn configuration stuffs for when dev'ing.

Is your feature request related to a problem? Please describe. The current pygeoapi codebase is a bit fragmented when it comes to running pygeoapi for development. The fact that we support three different web frameworks and a different web server for each causes duplication in the code and leads to subtle bugs.

Additionally, the initialization of each of the three web frameworks contains a lot of different places where pygeoapi tries to read values from the environment and also a lot of global variables that deal with keeping state of the pygeoapi configuration.

I've been working on an implementation for this and will be submitting a PR soon.

francbartoli commented 10 months ago

I'm sorry @ricardogsilva, but I have to give my -1 for this addition since I can see value only in certain cases of downstream projects. For example, if I want to run pygeoapi in serverless environments (lambda, az functions, etc) I see the fragmentation beneficial with its high grade of flexibility based on the selected framework.

ricardogsilva commented 10 months ago

@francbartoli

Perhaps I am missing some use case that you care about? For the concrete serverless example, I don't understand how this would be a hindrance - could you please elaborate on the difficulties that you foresee? I'm thinking that any shortcomings can likely be worked out if we discuss it :wink:

I've also submitted #1401 (it is still in draft, but its close to being ready - I'm just sorting out some docker-related stuff), which implements the ideas mentioned in this issue - perhaps you can take a look and test drive it to see whether it is indeed something negative for your use cases?

In my opinion there is nothing in the PR that takes away flexibility from pygeoapi - you can still:

From my point of view, regardless of gunicorn, just the benefit of not initializing pygeoapi config and API at import time, but rather having factory functions that do it explicitly when called is a major improvement to the codebase.

Having said all this, I can always be terribly wrong in my assumptions - Let's please discuss this a bit further to see if we can reach some common understanding about the implications of this proposed change.

francbartoli commented 10 months ago

From my point of view, regardless of gunicorn, just the benefit of not initializing pygeoapi config and API at import time, but rather having factory functions that do it explicitly when called is a major improvement to the codebase.

How is relevant what you are arguing above with this issue/PR, sorry I might be wrong but I can't follow it.

Regarding gunicorn or any other mechanism to wrap the default standard server: I really do not see its value during the development, sorry. I can understand useful to control the workers in production but again for asgi (that’s what I’m mostly interested to) I would certainly defer to the implementer how to optimize the performance in production environments. In cloud environments, i.e. Docker Swarm/Kubernetes, it’s suggested to not use a process manager with workers since that can be more robustly managed at cluster level. Also, FastAPI documentation states here that

Gunicorn by itself is not compatible with FastAPI, as FastAPI uses the newest ASGI standard.

even if Uvicorn has a Gunicorn-compatible worker class that would allow to do what you are proposing.

Overall, to be honest with you, I’m not seeing this as an improvement.

ricardogsilva commented 10 months ago

[apologies for the long wall of text]

If I understand your argument, you are worried that by having gunicorn be the chosen web server for the pygeoapi serve --starlette command it will harm async deployments on k8s because in those cases a bare uvicorn is more efficient - I think this may be a valid argument but a quick search online does not convince me that it is always better to use bare uvicorn in these contexts - notably:

I'm not an expert on this topic though, so despite me finding these counters, I think you make a valid point.

However, I imagine that use cases like:

are more common - the fastapi docs also recommend gunicorn in such scenarios

This is not to dismiss your argument in any way, it is just that I believe (I grant you that this is just an act of faith, I've done 0 benchmarks) that the majority of pygeoapi deployments would likely not see much difference in performance with bare uvicorn vs. gunicorn with uvicorn - Obviously I might be wrong, but see below for less-relevant points[^1].

Additionally, see below for the whole import time side-effects thing [^2]

I guess this issue/PR of mine comes down to me advocating for a bit more consistency in the pygeoapi codebase and coming up with a solution that does that while also enabling what I believe are the more common use cases.

For a high-performance, high-traffic k8s deployment, perhaps it would not be too much trouble to add some custom launcher instead of pygeoapi serve - perhaps by shipping a small python package with a pygeoapi plugin that just contains a custom CLI command - personally, I would prefer this to be a customization that is not on the pygeoapi repo. - I guess this may be the crux of the matter where we seem to not agree here.

I would not mind having a broader and more in-depth discussion about this, if you think it is worth it. I'll finish up my PR (which is almost complete) and then leave it in the good hands of you pygeoapi maintainers to ultimately moderate it :wink:


[^1]: What I think would probably make a big difference in performance would be if pygeoapi would be able to use uvicorn with the --loop uvloop option instead of the current --loop asyncio - uvloop seems to really be the secret sauce to make uvicorn fast, but pygeoapi does not use it yet. I think we need to replace our usage of nest-asyncio with something else in order to be able to unlock --loop uvloop - well, if anything, working on this PR has led me to this realization. But even then, there are other parts of the pygeoapi codebase that need some work in order to really take advantage of an async execution model.

[^2]: With regard to my tirade about how this PR refactors the code to not cause import-time side effects, yes we can agree that it is a minor point in the context of gunicorn vs non-gunicorn and I don't want to get too caught up in that argument. But the point was just that I see it as a major improvement - obviously it could be implemented without this whole gunicorn push too.

github-actions[bot] commented 6 months ago

As per RFC4, this Issue has been inactive for 90 days. In order to manage maintenance burden, it will be automatically closed in 7 days.

github-actions[bot] commented 6 months ago

As per RFC4, this Issue has been closed due to there being no activity for more than 90 days.