Open Ox0400 opened 1 year ago
Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the prefect server
command?
Hi @jakekaplan, thanks for your reply. I hope the documentation can add a description of server support for uvicorn env, especially parallel multi-process mode.
The prefect api server always blocking or raise 500, 502, timeout when start with single process mode and tasks so many. Multi process mode can optimize it. Also i known can set PREFECT_CLIENT_MAX_RETRIES
, PREFECT_API_REQUEST_TIMEOUT
, PREFECT_CLIENT_RETRY_EXTRA_CODES
.
Now the prefect server start
is missing --workers N
or other params to support production mode. But we can set env WEB_CONCURRENCY
or UVICORN_WORKERS
to support it.
Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the
prefect server
command?
The uvicorn docs imply that the CLI options or uvicorn.run params override the environment variables. With that uvicorn behavior the environment variables never make it to the server.
Hi thanks for filing this enhancement request. Could you describe a little more about what you're trying to do? From my understanding the env variables should still be respected on your server. Are you looking to pass them to the
prefect server
command?
It's working like this pic when set the WEB_CONCURRENCY
or UVICORN_WORKERS
.
Very useful for many flows or tasks.
Can new an env PREFECT_SERVER_WORKERS
, then start server command append --workers $PREFECT_SERVER_WORKERS
at https://github.com/PrefectHQ/prefect/blob/8f75e225284dfcb376b86dcfbb2c2a5f6e3a565f/src/prefect/cli/server.py#L120-L143 If wan't to use the uvicorn env.
Hi all -- I recently went down this rabbit hole as well.
My use case is on a SLURM managed HPC, and I am planning on processing a large all sky astronomy data set. My pipeline works fine in isolation, but I have began load testing my self-hosted prefect server by starting ~35 instances of my workflow.
I found that with the default prefect server settings was not scaling well. Inspecting the system I saw that a single python process was running at 100 percent. This led to API / database calls timing out, and some very laggy UI. I also saw some very strange errors from my workers, including tasks that completed successfully (with the message Finished in state Completed()
). The only thing I can think of is that these state transitions were not recorded correctly in the database or were dropping from a timed out interaction with the API.
I did find that WEB_CONCURRENCY
helped immediately. But I am still left with the occasional 'weird' completed task restarting. I do also know about
PREFECT_SQLALCHEMY_POOL_SIZE
PREFECT_SQLALCHEMY_MAX_OVERFLOW
but if I set these too high I start getting errors emitted by the prefect server from asyncio like this
sorry, too many clients already
``
I am wondering whether there can be some docs with some general tips about correctly setting these for a self-hosted prefect server? I would be happy to try to write something, but I would need some guidance and time to test. M aybe a rubber duck session.
Hi all -- I recently went down this rabbit hole as well.
My use case is on a SLURM managed HPC, and I am planning on processing a large all sky astronomy data set. My pipeline works fine in isolation, but I have began load testing my self-hosted prefect server by starting ~35 instances of my workflow.
I found that with the default prefect server settings was not scaling well. Inspecting the system I saw that a single python process was running at 100 percent. This led to API / database calls timing out, and some very laggy UI. I also saw some very strange errors from my workers, including tasks that completed successfully (with the message
Finished in state Completed()
). The only thing I can think of is that these state transitions were not recorded correctly in the database or were dropping from a timed out interaction with the API.I did find that
WEB_CONCURRENCY
helped immediately. But I am still left with the occasional 'weird' completed task restarting. I do also know aboutPREFECT_SQLALCHEMY_POOL_SIZE PREFECT_SQLALCHEMY_MAX_OVERFLOW
but if I set these too high I start getting errors emitted by the prefect server from asyncio like this
sorry, too many clients already `` I am wondering whether there can be some docs with some general tips about correctly setting these for a self-hosted prefect server? I would be happy to try to write something, but I would need some guidance and time to test. M aybe a rubber duck session.
Hi, you can ALTER SYSTEM SET max_connections TO 10240
for postgres then restart the postgres db.
Ohhh how silly of me! So those too many clients
errors were from postgres. Thanks for that!
For others who might running things in a container like me the max_connections and shared_buffers help helped
singularity pull docker://postgres
SINGULARITYENV_POSTGRES_PASSWORD="$POSTGRES_PASS" SINGULARITYENV_POSTGRES_DB="$POSTGRES_DB" SINGULARITYENV_PGDATA="$POSTGRES_SCRATCH/pgdata" \
singularity run --cleanenv --bind "$POSTGRES_SCRATCH":/var postgres_latest.sif -c max_connections=1000 -c shared_buffers=1024MB
Ohhh how silly of me! So those
too many clients
errors were from postgres. Thanks for that!For others who might running things in a container like me the max_connections and shared_buffers help helped
singularity pull docker://postgres SINGULARITYENV_POSTGRES_PASSWORD="$POSTGRES_PASS" SINGULARITYENV_POSTGRES_DB="$POSTGRES_DB" SINGULARITYENV_PGDATA="$POSTGRES_SCRATCH/pgdata" \ singularity run --cleanenv --bind "$POSTGRES_SCRATCH":/var postgres_latest.sif -c max_connections=1000 -c shared_buffers=1024MB
You're welcome. It's my honor to be able to assist you.
First check
Describe the issue
Prefect http server managed using uvicorn, the command like is
uvicorn --app-dir "/usr/local/lib/python3.10/site-packages" --factory prefect.server.api.server:create_app --host 0.0.0.0 --port 4200 --timeout-keep-alive 5
.export WEB_CONCURRENCY=10
orexport UVICORN_WORKERS=10
https://www.uvicorn.org/settings/
Describe the proposed change
Http Server extra config.
Additional context
No response