Closed agconti closed 1 year ago
Looks like this might be related to #573, but the fix there did not solve things for me.
Tried upgrading to apache/airflow:2.2.5-python3.9
like the issue suggests and latest airflow stable, apache/airflow:2.3.3-python3.9
, without success.
@agconti as I mentioned in https://github.com/airflow-helm/charts/issues/573#issuecomment-1269271170, it's almost certainly an issue with an incorrect version of the kombu
or celery
pip packages.
Your custom container image might not have the right version because when you run pip install
these days, pip tries to be smart and might upgrade/downgrade the version of an already installed package to prevent version conflicts.
So take a look at your custom image, and see if your pip installs
are changing the version of kombu
or celery
.
Also, does your environment work with the default apache/airflow:2.2.5-python3.9
image?
@thesuperzapper thanks for your help! 💖
I agree, I think its definitely something wrong with kombu or celery though I can't imagine what I did in my setup to cause it.
Here's a bit more context:
apache/airflow:2.2.5-python3.9
image, the same issue happens. This is also the case with: apache/airflow:2.3.3-python3.9
. FROM apache/airflow:2.3.3-python3.9
# Allows docker to cache installed dependencies between builds
COPY ./requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . /opt/airflow
requests==2.28.1
pandas==1.5.0
pyarrow==9.0.0
apache-airflow[amazon]==2.4.0
At any rate, I got around this problem this morning by switching to the KubernetesExecutor instead to avoid Celery altogether. It's not a solution or closure to this problem, but it works for our use case. I want to be respectful of your precious volunteer time and close this issue since it's no longer a problem for us. Hopefully, my debugging trail here can be helpful to anyone else running into a similar issue. Thanks for your help in trying to solve this!
@agconti actually I think your above Dockerfile has a very clear issue, you should not be changing the version of airflow by installing apache-airflow[amazon]==2.4.0
into the 2.3.3-python3.9
container image, this will inevitably cause problems.
What's great is that you don't need to install the amazon
or any other "extra" pip packages, because they are all installed in the official apache/airflow
images by default, so you can just remove that requirement, and everything should work!
EDIT: I highly recommend using CeleryExecutor
in most situations, because starting a whole Pod for every task is very wasteful (alternatively use CeleryKubernetesExecutor
and get the best of both worlds), plus that will let you use my upcoming tasks-aware autoscaler feature!
@thesuperzapper thanks for your help! I didn't realize that the amazon package was already installed. I'll make the update. 💖
Thanks for your recommendation on CeleeryExecutor as well!
Checks
User-Community Airflow Helm Chart
.Chart Version
8.6.1
Kubernetes Version
Helm Version
Description
I've deployed the helm chart according to this project's guides and recommendations. While each pod comes up fine, tasks cannot run. After inspecting the worker pods, the logs reveal they are not able to connect to the embedded Redis instance. After exec'ing into the redis-master pod and running PING, its clear that Redis is readable by the connection string that workers are using. However, the workers cannot connect.
Is there a configuration step I'm missing or an incorrect configuration that would cause this?
Relevant Logs
Logs for Redis pod. Redis is working and ready:
Logs from worker. It's not able to connect:
Logs from connecting to the Redis master pod, and connecting with redis-cli directly. It works:
Logs from flower. It appears to have connected to Redis fine:
Logs from scheduler, the celery task is timing out:
So it looks like redis is reachable, but the worker nodes are connecting incorrectly for some reason. Any idea what might be causing this?
Custom Helm Values