Open maxcountryman opened 5 years ago
@maxcountryman hm we haven't seen this, which version of Airflow are you running? I assume the problem goes away when airflow_metrics
isn't on the plugin path?
We're using the latest Airflow: 1.10.5. And yes, removing the plugin seems to have stopped the scheduler crashes.
That's peculiar. The only line that touches settings
is here https://github.com/getsentry/airflow-metrics/blob/2f56cc6d0eb125a113f6511153d439f40ddbc50b/airflow_metrics/airflow_metrics/patch_stats.py#L34 and that shouldn't cause settings.Session
to become None
.
Would you mind posting a more verbose stack trace to hopefully shed some more light on this issue?
Edit: I also found this issue https://issues.apache.org/jira/browse/AIRFLOW-2363 and in their case, it seems like a different error caused settings.Session
to become None
Unfortunately that's the only traceback I have and now that we've removed the plugin the scheduler isn't crashing.
We tried adding this again and consistently saw it re-introduce the crashed scheduler issue. This is a complete nonstarter for us so it would be great if someone could look into this more.
Also frustratingly there are no logs coming out of the scheduler process, it just dies.
If there are no logs, would you mind providing some more details regarding how your airflow is set up? If you can reproduce the crash in an environment I can run, that'd be even better. As it stands, there's no way for me to know what the problem is.
Hello Team we are facing a similar issue, we had deployed airflow on a Kubernetes cluster and added this plugin, where the airflow scheduler stopped working.
<TaskInstance: POSTGRESQL_S3_DATA_SYNC_DAG.postgresql_s3_data_sync_task 2020-01-07 01:10:00+00:00 [scheduled]> [2020-01-08 14:51:55,438] {{scheduler_job.py:1112}} INFO - Setting the following 1 tasks to queued state: <TaskInstance: POSTGRESQL_S3_DATA_SYNC_DAG.postgresql_s3_data_sync_task 2020-01-07 01:10:00+00:00 [queued]> [2020-01-08 14:51:55,438] {{scheduler_job.py:1148}} INFO - Sending ('POSTGRESQL_S3_DATA_SYNC_DAG', 'postgresql_s3_data_sync_task', datetime.datetime(2020, 1, 7, 1, 10, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 1) to executor with priority 1 and queue default [2020-01-08 14:51:55,438] {{base_executor.py:59}} INFO - Adding to queue: ['airflow', 'run', 'POSTGRESQL_S3_DATA_SYNC_DAG', 'postgresql_s3_data_sync_task', '2020-01-07T01:10:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/data_pipeline_dags.zip'] [2020-01-08 14:51:55,656] {{scheduler_job.py:1288}} INFO - Executor reports execution of POSTGRESQL_S3_DATA_SYNC_DAG.postgresql_s3_data_sync_task execution_date=2020-01-07 01:05:00+00:00 exited with status success for try_number 1 [2020-01-08 14:51:56,195] {{patch_requests.py:33}} WARNING - Found blacklisted domain: api.datadoghq.com [2020-01-08 14:51:56,199] {{api_client.py:166}} INFO - 202 POST https://api.datadoghq.com/api/v1/series (6.3708ms) [2020-01-08 14:51:56,428] {{datadog_logger.py:41}} INFO - datadog gauge: task.state 1 1 False {'state': 'queued'} [2020-01-08 14:51:56,428] {{datadog_logger.py:41}} INFO - datadog gauge: task.state 2 1 False {'state': 'running'} [2020-01-08 14:51:56,429] {{datadog_logger.py:41}} INFO - datadog gauge: task.state 14 1 False {'state': 'success'} [2020-01-08 14:51:57,531] {{datadog_logger.py:31}} INFO - datadog incr: scheduler_heartbeat 1 1 None [2020-01-08 14:52:03,559] {{datadog_logger.py:31}} INFO - datadog incr: scheduler_heartbeat 1 1 None [2020-01-08 14:52:06,203] {{patch_requests.py:33}} WARNING - Found blacklisted domain: api.datadoghq.com [2020-01-08 14:52:06,210] {{api_client.py:166}} INFO - 202 POST https://api.datadoghq.com/api/v1/series (10.215ms) [2020-01-08 14:52:06,446] {{datadog_logger.py:41}} INFO - datadog gauge: task.state 3 1 False {'state': 'running'} [2020-01-08 14:52:06,446] {{datadog_logger.py:41}} INFO - datadog gauge: task.state 14 1 False {'state': 'success'} [2020-01-08 14:52:09,575] {{datadog_logger.py:31}} INFO - datadog incr: scheduler_heartbeat 1 1 None [2020-01-08 14:52:15,560] {{datadog_logger.py:31}} INFO - datadog incr: scheduler_heartbeat 1 1 None [2020-01-08 14:52:16,213] {{patch_requests.py:33}} WARNING - Found blacklisted domain: api.datadoghq.com
@mikeclarke @Zylphrex would be great if someone could take a look--seems like others are having similar issues.
It seems like our scheduler is crashing since adding this plugin.
Here's a truncated traceback: