apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.16k stars 14.04k forks source link

Some pool and executor slot/task statsd metrics are always 0 #31132

Open StefanKurek opened 1 year ago

StefanKurek commented 1 year ago

Apache Airflow version

2.6.0

What happened

When queuing up a number of DAGs on a basic Airflow installation (SQLite with SequentialExecutor), a number of statd metrics were only reporting 0 values. This list included:

pool.queued_slots. pool.running_slots. executor.running_tasks

What you think should happen instead

I would expect these metrics to show non 0 values at some point. The webservice UI for the default pool shows running and queued tasks as non 0 values for a few minutes while the queued DAGs are processed.

How to reproduce

I have a fairly simple test DAG

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': datetime(2023, 5, 2),
    'retries': 1,
    'retry_delay': timedelta(minutes=1),
    'sla': timedelta(minutes=1),
}

dag = DAG(
    'longtask_dag',
    default_args=default_args,
    description='Example DAG with a task that takes about 2 minutes to execute',
    schedule_interval='*/5 * * * *',
)

t1 = BashOperator(
    task_id='long_running_task',
    bash_command='sleep 120',
    dag=dag
)

t2 = BashOperator(
    task_id='short_running_task',
    bash_command='echo "Short running task"',
    dag=dag
)

t2 >> t1

I then queue up about 20 runs of this DAG in the webservice, and start to monitor the statsd output from Airflow (after configuring it).

Operating System

macOS Monterey v12.3.1

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.4.0 apache-airflow-providers-ftp==3.3.1 apache-airflow-providers-http==4.3.0 apache-airflow-providers-imap==3.1.1 apache-airflow-providers-sqlite==3.3.2

Deployment

Virtualenv installation

Deployment details

I for the most part followed the part of the guide to install and run airflow here:

https://www.redhat.com/en/blog/monitoring-apache-airflow-using-prometheus

Anything else

This problem seems to be consistently reproducible.

There are also some related metrics that I am seeing similar issues with that I have not made an issue for yet:

pool.open_slots. executor.open_slots

These always seem to report the max number of slots (for either pool or executor), even when there are tasks that are running.

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 1 year ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

potiuk commented 1 year ago

Yes. it would be great for someone to track the exact list and remove those which are 0 - some of them were removed or discussed being removed. Would you want to lead it and implement it @StefanKurek ? If not, I will mark it as "good first issue" amd hopefully someone will.

ItIsOHM commented 1 year ago

Hey there @potiuk, I'd love to contribute as a beginner. Can you please assign this to me and also help me as to where I can start to tackle this issue from?

StefanKurek commented 1 year ago

Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI

ItIsOHM commented 1 year ago

Sorry for such a long response to this. I almost forgot about it because, I only noticed this issue when installing in the way that I mentioned. Once I installed using docker, then I no longer saw this issue FYI

Ah alright, thank you for letting me know :)

potiuk commented 1 year ago

I hope you will be able to reproduce it @ItIsOHM :) assigned you

potiuk commented 1 year ago

(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)

ItIsOHM commented 1 year ago

(or even if not - then we might be able to close it if you confirm it's not really reproducible easily)

Hahaah, i'd like to atleast try and fix this if it's good for a beginner like me :D Any help would be appreciated!

cjj1120 commented 4 months ago

this issue still persist. I'm able to get metric data for executor.running_tasks but not executor.queued_tasks, it's always 0.