PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.32k stars 1.5k forks source link

Faster import time #14423

Open qiangxinglin opened 2 days ago

qiangxinglin commented 2 days ago

First check

Prefect Version

3.x

Describe the current behavior

Setup a very simple task.py as the tutorial suggested:

import time
from prefect import flow, serve

@flow
def slow_flow(sleep: int = 5):
    print(f"Sleeping for {sleep} seconds.")
    time.sleep(sleep)

if __name__ == "__main__":
    slow_deploy = slow_flow.to_deployment(name="sleeper")
    serve(slow_deploy)

Then deploy it with docker-compose.yml:

services:
  db:
    image: postgres:16.3-alpine
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=prefect
    ports:
      - 5432:5432
    volumes:
      - db:/var/lib/postgresql/data
    profiles: [ "server" ]

  prefect:
    image: prefecthq/prefect:3.0.0rc9-python3.10
    command: prefect server start --host 0.0.0.0
    environment:
      - PREFECT_LOGGING_LEVEL=DEBUG
      - PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://postgres:postgres@db:5432/prefect
    ports:
      - 4200:4200
    depends_on:
      - db
    profiles: [ "server" ]

  worker:
    image: prefecthq/prefect:3.0.0rc9-python3.10
    working_dir: /app
    command: python -m task
    volumes:
      - .:/app
    environment:
      - PREFECT_API_URL=http://prefect:4200/api
      - PREFECT_LOGGING_LEVEL=DEBUG
    profiles: [ "worker" ]

volumes:
  db:

Then I write a simple test.py to invoke the deployment:

from prefect.deployments import run_deployment

run_deployment('slow-flow/sleeper', timeout=0)

I found that the flow run would stuck from SCHEDULED to PENDING for a few seconds, after inspecting the source code, I decide to adjust the polling inverval:

- PREFECT_WORKER_QUERY_SECONDS=0   # <- not sure if this setting is useful
- PREFECT_RUNNER_POLL_FREQUENCY=1  # <- this works

Then, the poller seems work very "hard" to pick up the flows from queue and submit it to the workers (local subprocess).

But, the flow run still stuck from PENDING to RUNNING for 1~2 seconds

Describe the proposed behavior

The question is:

Example Use

No response

Additional context

All the efforts I've made are trying to improve the latency, since most of my flows are "short task". I don't want to spend a few seconds during the "startup" staging. In celery, the extra overhead is quite small. I want to do similar things as in celery

zzstoatzz commented 1 day ago

hi @qiangxinglin - thank you for the issue!

we are definitely interested in improving import times wherever we can (cc @aaazzam)

I want to do similar things as in celery

In the meantime, you may want to check this repo out. We have a bunch of docker compose based demo applications using prefect background tasks (similar to celery).

qiangxinglin commented 1 day ago

@zzstoatzz Thank you for the reply! I tried task_worker/task_runner, have following questions: