uvicorn http server config, e.g parallel workers

Ox0400 commented 1 year ago

First check

[X] I added a descriptive title to this issue.
[X] I used GitHub search to find a similar request and didn't find it 😇

Describe the issue

Prefect http server managed using uvicorn, the command like is uvicorn --app-dir "/usr/local/lib/python3.10/site-packages" --factory prefect.server.api.server:create_app --host 0.0.0.0 --port 4200 --timeout-keep-alive 5.

export WEB_CONCURRENCY=10 or export UVICORN_WORKERS=10

https://www.uvicorn.org/settings/

Describe the proposed change

Http Server extra config.

Additional context

No response

jakekaplan commented 1 year ago

Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the prefect server command?

Ox0400 commented 1 year ago

Hi @jakekaplan, thanks for your reply. I hope the documentation can add a description of server support for uvicorn env, especially parallel multi-process mode.

The prefect api server always blocking or raise 500, 502, timeout when start with single process mode and tasks so many. Multi process mode can optimize it. Also i known can set PREFECT_CLIENT_MAX_RETRIES , PREFECT_API_REQUEST_TIMEOUT, PREFECT_CLIENT_RETRY_EXTRA_CODES.

Now the prefect server start is missing --workers N or other params to support production mode. But we can set env WEB_CONCURRENCY or UVICORN_WORKERS to support it.

mthanded commented 1 year ago

Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the prefect server command?

The uvicorn docs imply that the CLI options or uvicorn.run params override the environment variables. With that uvicorn behavior the environment variables never make it to the server.

Ox0400 commented 1 year ago

Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the prefect server command?

It's working like this pic when set the WEB_CONCURRENCY or UVICORN_WORKERS. Very useful for many flows or tasks.

Can new an env PREFECT_SERVER_WORKERS, then start server command append --workers $PREFECT_SERVER_WORKERS at https://github.com/PrefectHQ/prefect/blob/8f75e225284dfcb376b86dcfbb2c2a5f6e3a565f/src/prefect/cli/server.py#L120-L143 If wan't to use the uvicorn env.

tjgalvin commented 1 year ago

Hi all -- I recently went down this rabbit hole as well.

My use case is on a SLURM managed HPC, and I am planning on processing a large all sky astronomy data set. My pipeline works fine in isolation, but I have began load testing my self-hosted prefect server by starting ~35 instances of my workflow.

I found that with the default prefect server settings was not scaling well. Inspecting the system I saw that a single python process was running at 100 percent. This led to API / database calls timing out, and some very laggy UI. I also saw some very strange errors from my workers, including tasks that completed successfully (with the message Finished in state Completed()). The only thing I can think of is that these state transitions were not recorded correctly in the database or were dropping from a timed out interaction with the API.

I did find that WEB_CONCURRENCY helped immediately. But I am still left with the occasional 'weird' completed task restarting. I do also know about

PREFECT_SQLALCHEMY_POOL_SIZE
PREFECT_SQLALCHEMY_MAX_OVERFLOW

but if I set these too high I start getting errors emitted by the prefect server from asyncio like this


sorry, too many clients already
``

I am wondering whether there can be some docs with some general tips about correctly setting these for a self-hosted prefect server? I would be happy to try to write something, but I would need some guidance and time to test. M aybe a rubber duck session.

Ox0400 commented 1 year ago

Hi all -- I recently went down this rabbit hole as well.

My use case is on a SLURM managed HPC, and I am planning on processing a large all sky astronomy data set. My pipeline works fine in isolation, but I have began load testing my self-hosted prefect server by starting ~35 instances of my workflow.

I found that with the default prefect server settings was not scaling well. Inspecting the system I saw that a single python process was running at 100 percent. This led to API / database calls timing out, and some very laggy UI. I also saw some very strange errors from my workers, including tasks that completed successfully (with the message Finished in state Completed()). The only thing I can think of is that these state transitions were not recorded correctly in the database or were dropping from a timed out interaction with the API.

I did find that WEB_CONCURRENCY helped immediately. But I am still left with the occasional 'weird' completed task restarting. I do also know about
PREFECT_SQLALCHEMY_POOL_SIZE
PREFECT_SQLALCHEMY_MAX_OVERFLOW
but if I set these too high I start getting errors emitted by the prefect server from asyncio like this
sorry, too many clients already
``

I am wondering whether there can be some docs with some general tips about correctly setting these for a self-hosted prefect server? I would be happy to try to write something, but I would need some guidance and time to test. M aybe a rubber duck session.

Hi, you can ALTER SYSTEM SET max_connections TO 10240 for postgres then restart the postgres db.

tjgalvin commented 1 year ago

Ohhh how silly of me! So those too many clients errors were from postgres. Thanks for that!

For others who might running things in a container like me the max_connections and shared_buffers help helped

singularity pull docker://postgres
SINGULARITYENV_POSTGRES_PASSWORD="$POSTGRES_PASS" SINGULARITYENV_POSTGRES_DB="$POSTGRES_DB" SINGULARITYENV_PGDATA="$POSTGRES_SCRATCH/pgdata" \
        singularity run --cleanenv --bind "$POSTGRES_SCRATCH":/var postgres_latest.sif -c max_connections=1000 -c shared_buffers=1024MB

Ox0400 commented 1 year ago

Ohhh how silly of me! So those too many clients errors were from postgres. Thanks for that!

For others who might running things in a container like me the max_connections and shared_buffers help helped
singularity pull docker://postgres
SINGULARITYENV_POSTGRES_PASSWORD="$POSTGRES_PASS" SINGULARITYENV_POSTGRES_DB="$POSTGRES_DB" SINGULARITYENV_PGDATA="$POSTGRES_SCRATCH/pgdata" \
        singularity run --cleanenv --bind "$POSTGRES_SCRATCH":/var postgres_latest.sif -c max_connections=1000 -c shared_buffers=1024MB

You're welcome. It's my honor to be able to assist you.

PrefectHQ / prefect