apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.94k stars 14.26k forks source link

Send task to queue in bulk - Celery Executor #8854

Open mik-laj opened 4 years ago

mik-laj commented 4 years ago

Description

Hello, I recently took care of CeleryExecutor. I managed to optimize the status retrieval by using bulk operations. Instead of fetching the status for each task using a separate query, one is sent for all tasks. This has accelerated this process more than 100 times in many cases. https://github.com/apache/airflow/pull/7542 However, we still use single requests in many processes to send tasks to the queue. This is very effective because of network latency. https://github.com/apache/airflow/blob/f1dc2e0b0e358582c1df0cc07a5cc95fa721dc44/airflow/executors/celery_executor.py#L196-L206 It would be nice if it could be done as a bulk request in a single request. For Redis, this means using Pipeline. https://github.com/andymccurdy/redis-py#pipelines

Can it be done easily in Celery?

Best regards, Kamil

mik-laj commented 4 years ago

@auvipy Can you look at it? You're a Celery expert. I think Celery doesn't support it yet, but I might be wrong.

auvipy commented 4 years ago

celery redis need more care actually :) with my current time and other priorities in celery i didnt contribute much on redis part. I'm more focused on amqp 1.0 and kafka support and asyncio based worker....

kurtqq commented 2 years ago

this can be a good improvement