apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.71k stars 14.21k forks source link

The scheduler does not appear to be running. Last heartbeat was received X minutes ago. #19192

Closed t4n1o closed 2 years ago

t4n1o commented 2 years ago

Apache Airflow version

2.1.4

Operating System

Linux / Ubuntu Server

Versions of Apache Airflow Providers

apache-airflow-providers-ftp==2.0.1 apache-airflow-providers-http==2.0.1 apache-airflow-providers-imap==2.0.1 apache-airflow-providers-postgres==2.3.0

Deployment

Virtualenv installation

Deployment details

Airflow v2.1.4 Postgres 14 LocalExecutor Installed with Virtualenv / ansible - https://github.com/idealista/airflow-role

What happened

image

I run a single BashOperator (for a long running task, we have to download data for 8+ hours initially to download from the rate-limited data source API, then download more each day in small increments).

We're only using 3% CPU and 2 GB of memory (out of 64 GB) but the scheduler is unable to run any other simple task at the same time.

Currently only the long task is running, everything else is queued, even thought we have more resources: image

What you expected to happen

I expect my long running BashOperator task to run, but for airflow to have the resources to run other tasks without getting blocked like this.

How to reproduce

I run a command with bashoperator (I use it because I have python, C, and rust programs being scheduled by airflow). bash_command='umask 002 && cd /opt/my_code/ && /opt/my_code/venv/bin/python -m path.to.my.python.namespace'

Configuration:

airflow_executor: LocalExecutor
airflow_database_conn: 'postgresql+psycopg2://airflow:airflow_pass@localhost:5432/airflow'
airflow_database_engine_encoding: utf-8
airflow_database_engine_collation_for_ids:
airflow_database_pool_enabled: True
airflow_database_pool_size: 3
airflow_database_max_overflow: 10
airflow_database_pool_recycle: 2000
airflow_database_pool_pre_ping: True
airflow_database_schema:
airflow_database_connect_args:
airflow_parallelism: 10
airflow_dag_concurrency: 7
airflow_dags_are_paused_at_creation: True
airflow_max_active_runs_per_dag: 16
airflow_load_examples: False
airflow_load_default_connections: False
airflow_plugins_folder: "{{ airflow_app_home }}/plugins"

# [operators]
airflow_operator_default_owner: airflow
airflow_operator_default_cpus: 1
airflow_operator_default_ram: 512
airflow_operator_default_disk: 512
airflow_operator_default_gpus: 0
airflow_default_queue: default
airflow_allow_illegal_arguments: False

Anything else

This occurs every time consistently, also on 2.1.2

The other tasks have this state: image

When the long-running task finishes, the other tasks resume normally. But I expect to be able to do some parallel execution /w LocalExecutor.

I haven't tried using pgbouncer.

Are you willing to submit PR?

Code of Conduct

deepchand commented 2 years ago

Too much debug logs so tried to filter out only important one, let me know if more info needed

potiuk commented 2 years ago

I think your problem is simply extremely slow connection to your database. 5 seconds to run single query indicate a HUGE problem you have with your database. It should take single mlilliseconds .

THIS IS your problem. Not airflow.

You should fix your DB/connectivity and debug why your database is 1000x slower than it should.