Open dannyl1u opened 1 month ago
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
Hello @potiuk @howardyoo @kaxil, I'm planning on working on this project, and just wanted to confirm that this is indeed something we want for Airflow 3.0 before I begin implementing
This sounds great. Could you sketch out the sort of API you are thinking of, and give an example use from somewhere in scheduler job please?
Hi @ashb, could you clarify what you mean by the use in the scheduler job?
The first step is refactoring the metrics code so that https://github.com/apache/airflow/blob/main/airflow/metrics/base_stats_logger.py will have a get_name
method that https://github.com/apache/airflow/blob/main/airflow/metrics/datadog_logger.py, https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py, https://github.com/apache/airflow/blob/main/airflow/metrics/statsd_logger.py, etc. will inherit the method for their own specific naming conventions. This will improve the code by not needing specific name validators for each implementation https://github.com/apache/airflow/blob/main/airflow/metrics/otel_logger.py#L128.
cc'ing @ferruzzi in case I'm missing something
Then, once the above is done, we will plan the next step to build an automated system to generate docs based on the actual metrics.
cc @arshiazr
@ferruzzi @howardyoo -> I think you two should be quite a bit involved in the review and ideas here.
Initially when we implemented open-telemetry metrics we thought we could do it in the way that we could emit legacy 'statsd" metrics using opentelemetry interface (basically implement or use some kind of opentelemetry -> statsd
bridge), because of some implementation details it turned out to be impossible (or difficult).
However that can still be explored, maybe that is still a possiblity? Implementing our own interface that wraps both Opentelemetry Metrics and Statsd one is of course a possibility, but (at least intuitively - without knowing all the details) - it could be that opentelemetry API could be used to emit the legacy statsd events in "mostly" compatible way.
And implementing it in Airflow 3 gives us also an opportunity for the "mostly" part. If we can get 90% of the backwards-compatible statsd metrics in place and only "few", "less important" metrics changed to be incompatible, maybe that is a way to go?
Hi Jarek (and Dennis),
Yes, in the beginning we hoped that how Airflow used statsd could also be applied to Airflow Otel in the same way (having that classic statsd metrics name having much of the context - with its long length), and now it looks like we may have to do it the alternate way, but better way for the future (because to be fair, how stats d were naming metrics inherently had problems). for state in State.task_states: Stats.incr( f"ti.finish.{ti.task.dag_id}.{ti.task.task_id}.{state}", count=0, tags=ti.stats_tags, )
Stats.incr( "ti.finish", count=0, tags={**ti.stats_tags, "state": str(state)}, )
As for the naming problem, I do believe that Dennis has added codes to support two versions, so even though they get the warning in the OTEL side, we are not losing too much of the information (see the example code above), but I would say we'll be getting some unwanted trimmed metrics flowing into OTEL.
The only problem in that case would be these unwanted and unnecessary metrics that would flow into OTEL SDK and eventually getting trimmed. However, I believe we can be able to fix it by applying processor to filter them out in the collector level (that's why I do believe having collector in the architecture is important - for it could act as another processing layer separated from the source of telemetry).
Those would be my two cents for now... Any thoughts, Dennis? Maybe I might be missing something.
On Sun, Oct 13, 2024 at 12:00 PM Jarek Potiuk @.***> wrote:
@ferruzzi https://github.com/ferruzzi @howardyoo https://github.com/howardyoo -> I think you two should be quite a bit involved in the review and ideas here.
Initially when we implemented open-telemetry metrics we thought we could do it in the way that we could emit legacy 'statsd" metrics using opentelemetry interface (basically implement or use some kind of opentelemetry -> statsd bridge), because of some implementation details it turned out to be impossible (or difficult).
However that can still be explored, maybe that is still a possiblity? Implementing our own interface that wraps both Opentelemetry Metrics and Statsd one is of course a possibility, but (at least intuitively - without knowing all the details) - it could be that opentelemetry API could be used to emit the legacy statsd events in "mostly" compatible way.
And implementing it in Airflow 3 gives us also an opportunity for the "mostly" part. If we can get 90% of the backwards-compatible statsd interface in place and only "few", "less important" metrics changed to be incompatible, maybe that is a way to go?
— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/42881#issuecomment-2409050782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZNLLUYRUHZ7VWY3HTPWHTZ3KRLLAVCNFSM6AAAAABPVUUOT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBZGA2TANZYGI . You are receiving this because you were mentioned.Message ID: @.***>
I have just raised a new PR (https://github.com/apache/airflow/pull/43018). I'd be grateful if you can review it.
cc @ferruzzi
Description
From @ferruzzi:
Use case/motivation
From @ferruzzi:
Related issues
No response
Are you willing to submit a PR?
Code of Conduct