deepset-ai / hayhooks

Deploy Haystack pipelines behind a REST Api.
https://haystack.deepset.ai
Apache License 2.0
30 stars 8 forks source link

deploy_utils.py w async execution support #27

Open alex-stoica opened 3 weeks ago

alex-stoica commented 3 weeks ago

This PR enhances the deploy_utils.py file to support asynchronous execution of the pipeline_run method using run_in_threadpool. This change allows the server to handle multiple pipeline requests concurrently, improving performance and efficiency.

I've already tested it locally with a docker-compose.yml with

services:
  hayhooks:
    image: deepset/hayhooks:main
    ports:
      - "1416:1416"
    volumes:
      - ./pipelines:/opt/pipelines
      - ./custom_components:/opt/custom_components
      - ./my_custom_deploy_utils.py:/opt/venv/lib/python3.12/site-packages/hayhooks/server/utils/deploy_utils.py
    environment:
      - PYTHONPATH=/opt
      - HAYHOOKS_ADDITIONAL_PYTHONPATH=/opt/custom_components

where my_custom_deploy_utils.py is the adjusted file (with run_in_threadpool etc)

If you need additional testing resources, I've created a dummy component that just waits, some pipeline and send_requests_and_save.py that makes multiple async request in order previously

df = pd.read_csv("output.csv")
df.started_at = pd.to_datetime(df.started_at)
df.sort_values(by = 'started_at')

Returned syncronous result of the format image (pipeline 4 starts after all tasks from pipeline 1 are ended)

With the changes, it returned me image which look intertwined (async)

test_pipelines.zip