Closed adammarchewka closed 11 months ago
@adammarchewka I feel like there must be something bad in your dags/
folder, for example, I wonder if one of your dags is somehow running its own version of airflow (by importing or running something within airflow itself)?
Either way, the first thing to try is spinning up a separate airflow cluster, in another namespace. First, try without your dags, and then, second, try with your dags (assuming this is safe to do), and see if the problems persist in either case.
This issue has been automatically marked as stale because it has not had activity in 60 days. It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.
Issues never become stale if any of the following is true:
lifecycle/frozen
label
Checks
User-Community Airflow Helm Chart
.Chart Version
8.7.0
Kubernetes Version
Helm Version
Description
Airflow Scheduler started to have issue when being restarted (either manually or forcefuly) - some task instances are stuck in running/queued state after restart and Scheduler somehow misses reference to them (or fails to readopt them) resulting in critical error about TaskInstance missing.
Side issue: Scheduler and Triggerer seem to eat any resource we throw at them, always sitting almost at max CPU usage even if nothing much is happening that we can see in the logs
Relevant Logs
Custom Helm Values