When running models that have names containing multibyte characters, runtime errors occur in Airflow environments where statsd is enabled (MWAA uses this statsd metric for collecting metrics in Cloudwatch).
To address this, Airflow 2.9 introduced the ability to render tasks using display_name, which allows task names to be rendered separately from their task_id.
It would be ideal to have a mechanism that allows users to set a different task_id while using the model name as the display_name when working with models that contain multibyte characters.
Use case/motivation
The following error occurs in MWAA environment:
[2024-10-17, 10:27:03 UTC] {taskinstance.py:2865} INFO - Starting attempt 1 of 1
[2024-10-17, 10:27:03 UTC] {validators.py:135} ERROR - Invalid stat name: dag.run_cosmos_dbt.日本語名モデル_run.queued_duration.
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 132, in wrapper
stat = handler_stat_name_func(stat)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/airflow/.local/lib/python3.11/site-packages/airflow/metrics/validators.py", line 220, in stat_name_default_handler
raise InvalidStatsNameException(
airflow.exceptions.InvalidStatsNameException: The stat name (dag.run_cosmos_dbt.日本語名モデル_run.queued_duration) has to be composed of ASCII alphabets, numbers, or the underscore, dot, or dash characters.
Thanks for logging and proposing a fix for this, @t0momi219 ! We're planning to release Cosmos 1.8 in mid-November, and it will be great to have this feature in it.
Description
When running models that have names containing multibyte characters, runtime errors occur in Airflow environments where statsd is enabled (MWAA uses this statsd metric for collecting metrics in Cloudwatch).
Related Issue: https://github.com/apache/airflow/issues/18010
To address this, Airflow 2.9 introduced the ability to render tasks using display_name, which allows task names to be rendered separately from their task_id.
Reference: https://airflow.apache.org/docs/apache-airflow/stable/_modules/airflow/example_dags/example_display_name.html
It would be ideal to have a mechanism that allows users to set a different task_id while using the model name as the display_name when working with models that contain multibyte characters.
Use case/motivation
The following error occurs in MWAA environment:
Related issues
No response
Are you willing to submit a PR?