airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
630 stars 474 forks source link

feat: add liveness probe for celery workers #766

Closed nickwood closed 10 months ago

nickwood commented 11 months ago

What issues does your PR fix?

What does your PR do?

Currently celery workers can enter a 'zombie' state where they become disconnected from celery and will no longer pick up new jobs.

This PR adds a iveness probe (enabled by default) to detect such pods so they can be killed by k8s and recreated.

Checklist

For all Pull Requests

Talador12 commented 11 months ago

@thesuperzapper this one comes from our org, bringing a change that helped us upstream

nglehuy commented 8 months ago

FYI, for anyone who has airflow worker liveness probe failure TypeError: argument of type 'NoneType' is not iterable. This is because of celery inspect can not see the worker running. We need to set the environment variable AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL=True instead of False to make the celery inspect ping works.

Talador12 commented 8 months ago

Could we set this AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL field as the default since the liveness probe is also default behavior?

On Fri, Nov 3, 2023 at 2:58 AM Nguyễn Lê Huy @.***> wrote:

FYI, for anyone who has airflow worker liveness probe failure TypeError: argument of type 'NoneType' is not iterable. This is because of celery inspect can not see the worker running. We need to set the environment variable AIRFLOWCELERYWORKER_ENABLE_REMOTE_CONTROL=True instead of False to make the celery inspect ping works.

— Reply to this email directly, view it on GitHub https://github.com/airflow-helm/charts/pull/766#issuecomment-1792009724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASBCZMVLVPYNAOCDGCD5L3YCSP3VAVCNFSM6AAAAAA3BOPBBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSGAYDSNZSGQ . You are receiving this because you commented.Message ID: @.***>

nglehuy commented 8 months ago

Could we set this AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL field as

It's default 'true' in airflow config: https://airflow.apache.org/docs/apache-airflow-providers-celery/stable/configurations-ref.html#worker-enable-remote-control The FYI is just for anyone who somehow already set it to false.