Closed alvaroserper closed 11 months ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
looks like some legacy metric names weren't added to the exemptions list https://github.com/apache/airflow/pull/30873/files#diff-1cca954ec0be1aaf2c212e718c004cb0902a96ac60043bf0c97a782dee52cc32R55
@ferruzzi It looks like those metrics were added to the codebase shortly before the exemptions list was created: https://github.com/apache/airflow/blob/8fdf3582c2967161dd794f7efb53691d092f0ce6/airflow/models/taskinstance.py#L2184
should we add them to the exemption list?
r"^dag\.(?P<dag_id>.*)\.(?P<task_id>.*)\.queued_duration$",
r"^dag\.(?P<dag_id>.*)\.(?P<task_id>.*)\.scheduled_duration$",
or remove their invocations?
Okey thank you, this will be solved in 3.0 v ?
should we add them to the exemption list?
Must have been some awkward timing, adding new non-compliant names should have been prevented by the unit tests.... Yeah, I guess in this case, let's add it to the exemption list. Can you cut the PR?
It has already been fixed and will be applied in the next release https://github.com/apache/airflow/pull/34531
I know this isn't really an answer, but the root cause id that when you combine a long dag_id
and a long task_id
, the total length of the metric name exceeds OTel's max name length. A temporary workaround would be to use shorter names. Again: I know that's not a satisfactory solution, and the fix has already been applied to the next release, but if you want a temporary band-aid, that's one option.
I just double-checked and this fix should be in Airflow 2.7.2 which is currently being voted on and should be out Very Soon :tm: ((next week I think, but don't hold me to that))
2.7.2 was released this morning!
Apache Airflow version
2.7.1
What happened
When running a dag an error ocurred. The error says that there is a metric with an invalid name. This causes that the task of the dag is set up for retry. Then the task executes again and is marked as success.
What you think should happen instead
There should not be an error with the name of a default metric causing a task to retry.
How to reproduce
Enable opentelemetry in airflow.cfg:
Run opentelemetry collector docker:
Operating System
Ubuntu 22.04.3 LTS
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct