apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.68k stars 13.9k forks source link

DagFileProcessor produces invalid metric tags #30716

Open sungwy opened 1 year ago

sungwy commented 1 year ago

Apache Airflow version

2.6.0b1

What happened

The recently added dag_processing.processes _filepath metric tag always fails to publish the metric tag because file path delimiter '/' is not a valid character according to the stat_name_default_handler

airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:39.738-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start.
Traceback (most recent call last):
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 242, in wrapper
stat = handler_stat_name_func(stat)
File "/mnt/c/Users/user/Documents/GitHub/airflow-dir/venv/lib/python3.9/site-packages/airflow/stats.py", line 210, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
[2023-04-18T12:21:51.375-0400] {stats.py:245} ERROR - Invalid stat name: dag_processing.processes,file_path=/mnt/c/Users/user/Documents/GitHub/airflow-dir/test_dag.py,action=finish.

What you think should happen instead

Although it is not a fatal error it feels erroneous that the default stats name handler is not able to support the metric tag out of the box.

We do have the following parameters that allows a user to get around this issue:

  1. stat_name_handler
  2. statsd_disabled_tags

But, I would like to advocate that we include '/' as a supported character to stat_name_default_handler, or sanitize the file_path value to use a supported character instead. It would feel more intuitive for a new user using the feature to have metric tags work correctly with the default configurations, rather than needing to implement their own stat_name_handler to work around the issue.

Examples Metrics: https://github.com/apache/airflow/blob/main/airflow/dag_processing/manager.py#L998 https://github.com/apache/airflow/blob/main/airflow/dag_processing/processor.py#L767

How to reproduce

Enable stats with:

[metrics]
statsd_on = True
statsd_host = localhost
statsd_port = 8125
statsd_prefix = 
statsd_influxdb_enabled = True

Operating System

Red Hat Enterprise Linux Server 7.6 (Maipo)

Versions of Apache Airflow Providers

No response

Deployment

Virtualenv installation

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

Code of Conduct

Gowthami03B commented 1 year ago

Can I take this one?

potiuk commented 1 year ago

Sure

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has been open for 365 days without any activity. There has been several Airflow releases since last activity on this issue. Kindly asking to recheck the report against latest Airflow version and let us know if the issue is reproducible. The issue will be closed in next 30 days if no further activity occurs from the issue author.

shalberd commented 1 month ago

@eladkal @potiuk similar error in Airflow 2.8.2 dag-processor pod/container

 {dag_processor_job_runner.py:60} INFO - Starting the Dag Processor Job
[2024-06-18T02:50:07.781+0000] {validators.py:101} ERROR - Invalid stat name: dag_processing.last_duration.random error 2-0424133757V
/python3.8/site-packages/airflow/metrics/validators.py", line 185, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.last_run.seconds_ago.random error 2-0424133757V5) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
ares-b commented 2 weeks ago

Same error in Airflow 2.9.1

ERROR - Invalid stat name: dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start.
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 134, in wrapper
    stat = handler_stat_name_func(stat)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 221, in stat_name_default_handler
    raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag_processing.processes,file_path=/opt/airflow/dags/live/sha/appconf/contracts/contracts.py,action=start) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
potiuk commented 2 weeks ago

Yes. It still waits for someone who will investigate and fix it. Can be anyone - even those who experience it (actually it would be best as they could easily test if it's fixed).