Open WytzeBruinsma opened 9 months ago
We are seeing this as well, although the time values are a bit longer. Is there a tolerance var we can set to make the aysnc process a bit less time sensitive?
@justplanenutz @WytzeBruinsma you should check the code of the trigger you are using, the message is probably correct that something is wrong with it. Did you write it yourself, or are you using one from the official providers?
Also, please note that this is an INFO-level log, so it's probably cosmetic. Are you seeing any issues related to it?
For your reference, here is the code (in airflow itself) that detects this condition and writes the log:
https://github.com/apache/airflow/blob/2.9.2/airflow/jobs/triggerer_job_runner.py#L557-L582
@thesuperzapper Trigger has been running fine in the past so we're confident that the code is sound. We normally set our logs at INFO as well, filtering them is not the issue.
We are currently looking for any deltas in the code base that may have aggravated an edge condition.
Our trigger process is running in kubernetes and we have collected metrics for CPU and Memory usage. We noticed a significant increase in CPU and Memory consumption just before the problems started. When we restart the pod it's all good... so maybe a resource leak of some kind?
@justplanenutz In any case, it's very unlikely to be related to this chart.
You should probably raise an issue upstream if you figure out what was causing it, feel free to link it here if you do.
Checks
User-Community Airflow Helm Chart
.Chart Version
8.8.0
Kubernetes Version
Helm Version
Description
The airflow pod trigger is raising errors and is slowing down Airflow processes. The error is;
Triggerer's async thread was blocked for 0.23 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines.
. I tried resolving this by increasing the resources, but even after removing all the limits and giving it 10 GB RAM and lots of CPU head room it still raises this error. I also check the response times of the Postgres database and couldn't find any thing that could slow down the async process and cause this error. Please let me know what other steps I can do to resolve this error.Relevant Logs
Custom Helm Values
No response