apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.31k forks source link

Support of ResourceAttributes and for OTEL metrics #42424

Closed howardyoo closed 1 month ago

howardyoo commented 1 month ago

Description

What this is about

Currently, OTEL metrics does not support OTEL_RESOURCE_ATTRIBUTES for its metrics being instrumented, which can provide additional details regarding the metrics data being emitted from Airflow.

What is the value

Resource attributes are very helpful in describing the details about what the resource is. User who wants to deploy Airflow and make sure to have a specific resource attributes available would make analyzing opentelemetry metrics helpful. A good example of resource attributes are:

Use case/motivation

What needs to be done

When OTEL metrics are being initialized, make sure to check the OTEL_RESOURCE_ATTRIBUTES env. variable, and if exists, add them as the metric's resource attributes, so that it can be emitted along with the metrics data.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

mxmrlt commented 1 month ago

The same is needed for OTEL traces.

Indeed we should be able to use documented OTEL_RESOURCE_ATTRIBUTES environment variable so that we can for instance set and use ResourceAttributes.DEPLOYMENT_ENVIRONMENT.

This is mandatory to make it work correctly and consistent when monitoring (Elastic/Kibana/APM for me) moreover when tracing in distributed environnements.



# airflow/traces/otel_tracer.py

class OtelTrace:
    """
    Handle all tracing requirements such as getting the tracer, and starting a new span.

    When OTEL is enabled, the Trace class will be replaced by this class.
    """

    def __init__(self, span_exporter: ConsoleSpanExporter | OTLPSpanExporter, tag_string: str | None = None):
        self.span_exporter = span_exporter
        self.span_processor = BatchSpanProcessor(self.span_exporter)
        self.tag_string = tag_string
        self.otel_service = conf.get("traces", "otel_service")

    def get_tracer(
        self, component: str, trace_id: int | None = None, span_id: int | None = None
    ) -> OpenTelemetryTracer | Tracer:
        """Tracer that will use special AirflowOtelIdGenerator to control producing certain span and trace id."""
        resource = Resource(
            attributes={
                HOST_NAME: get_hostname(),
                SERVICE_NAME: self.otel_service

                # Every other OTEL_RESOURCE_ATTRIBUTES like 'deployment.environment'
                # but also everything available in opentelemetry/semconv/resource/__init__.py

            }
        )
        if trace_id or span_id:
            # in case where trace_id or span_id was given
            tracer_provider = TracerProvider(
                resource=resource, id_generator=AirflowOtelIdGenerator(span_id=span_id, trace_id=trace_id)
            )
        else:
            tracer_provider = TracerProvider(resource=resource)
        tracer_provider.add_span_processor(self.span_processor)
        tracer = tracer_provider.get_tracer(component)
        """
        Tracer will produce a single ID value if value is provided. Note that this is one-time only, so any
        subsequent call will produce the normal random ids.
        """
        return tracer`
howardyoo commented 1 month ago

Correct.I have recently created a PR that would do the same to otel traces, and it was reviewed and approved.So hopefully, these will get into the next Airflow release soon.HowardSent from my iPhoneOn Sep 25, 2024, at 7:17 AM, mxmrlt @.***> wrote: The same is needed for OTEL traces. Indeed we should be able to use documented OTEL_RESOURCE_ATTRIBUTES environment variable so that we can for instance set and use ResourceAttributes.DEPLOYMENT_ENVIRONMENT. This is mandatory to make it work correctly and consistent when monitoring (Elastic/Kibana/APM for me) moreover when tracing in distributed environnements.

airflow/traces/otel_tracer.py

class OtelTrace: """ Handle all tracing requirements such as getting the tracer, and starting a new span.

When OTEL is enabled, the Trace class will be replaced by this class.
"""

def __init__(self, span_exporter: ConsoleSpanExporter | OTLPSpanExporter, tag_string: str | None = None):
    self.span_exporter = span_exporter
    self.span_processor = BatchSpanProcessor(self.span_exporter)
    self.tag_string = tag_string
    self.otel_service = conf.get("traces", "otel_service")

def get_tracer(
    self, component: str, trace_id: int | None = None, span_id: int | None = None
) -> OpenTelemetryTracer | Tracer:
    """Tracer that will use special AirflowOtelIdGenerator to control producing certain span and trace id."""
    resource = Resource(
        attributes={
            HOST_NAME: get_hostname(),
            SERVICE_NAME: self.otel_service

            # Every other OTEL_RESOURCE_ATTRIBUTES like 'deployment.environment'
            # but also everything available in opentelemetry/semconv/resource/__init__.py

        }
    )
    if trace_id or span_id:
        # in case where trace_id or span_id was given
        tracer_provider = TracerProvider(
            resource=resource, id_generator=AirflowOtelIdGenerator(span_id=span_id, trace_id=trace_id)
        )
    else:
        tracer_provider = TracerProvider(resource=resource)
    tracer_provider.add_span_processor(self.span_processor)
    tracer = tracer_provider.get_tracer(component)
    """
    Tracer will produce a single ID value if value is provided. Note that this is one-time only, so any
    subsequent call will produce the normal random ids.
    """
    return tracer`

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

mxmrlt commented 1 month ago

Glad to read it.

I see you're the creator of airflow_otel_provider. Do you know if there is a way to auto-instrument our dags so that spans are created and exported for any custom code like requests calls or kafka publishing etc...? OpenTelemetry should permit that if I'm right (https://opentelemetry.io/docs/zero-code/python/) but perhaps I'm not.

This would save us from having to declare manually what you show in the README :

RequestsInstrumentor().instrument(tracer_provider=otel_hook.tracer_provider)

If you have any other advice on this topic please tell me.

Thank you

howardyoo commented 1 month ago

Auto-instrumentation is a tricky(?) area, especially trying to instrument a complex system like Airflow (It does work extremely well with smaller applications like microservices, where that was the area the auto-instrumentation usually focuses on). Technically, I would say the instrumenting the whole Airflow using otel auto-instrumentation would work, but you may have to do it on your own risk.

I would be a little worried if we did that because that would introduce a huge amount of telemetry data (I know how much Airflow calls database queries just to keep it idly running), and also potential impact on its performance has not been greatly studied.

So, when I was implementing the AIP-49 (the otel traces for Airflow), I purposely scoped out the auto-instrumentation aspects.

However, if there's a good reason / value / need to provide certain level of enabling auto-instrumentation in terms of running operators (e.g. python operators), that may be a good discussion to start with..

So, generally, the Airflow community welcomes contributions of any types as long as those contributions have enough support (things are voted and approved), and the discussion has been made enough. If something sounds like a good idea (or you have found something) - please share with the community and then it can happen as an implementation work!

Regards, Howard

On Wed, Sep 25, 2024 at 8:33 AM mxmrlt @.***> wrote:

Glad to read it.

I see you're the creator of airflow_otel_provider. Do you know if there is a way to auto-instrument our dags so that spans are created and exported for any custom code like requests calls or kafka publishing etc...? OpenTelemetry should permit that if I'm right ( https://opentelemetry.io/docs/zero-code/python/) but perhaps I'm not.

This would save us from having to declare manually what you show in the README :

RequestsInstrumentor().instrument(tracer_provider=otel_hook.tracer_provider)

If you have any other advice on this topic please tell me.

Thank you

— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/42424#issuecomment-2374103765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZNLLQJY55TV6IDKRJRMD3ZYK3UJAVCNFSM6AAAAABOW5EZVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZUGEYDGNZWGU . You are receiving this because you authored the thread.Message ID: @.***>

howardyoo commented 1 month ago

Your question kind of gave me some ideas, so maybe the otel airflow provider could help making it easier for users to enable certain instrumentations... but that would be something.

On Wed, Sep 25, 2024 at 10:35 AM Howard Yoo @.***> wrote:

Auto-instrumentation is a tricky(?) area, especially trying to instrument a complex system like Airflow (It does work extremely well with smaller applications like microservices, where that was the area the auto-instrumentation usually focuses on). Technically, I would say the instrumenting the whole Airflow using otel auto-instrumentation would work, but you may have to do it on your own risk.

I would be a little worried if we did that because that would introduce a huge amount of telemetry data (I know how much Airflow calls database queries just to keep it idly running), and also potential impact on its performance has not been greatly studied.

So, when I was implementing the AIP-49 (the otel traces for Airflow), I purposely scoped out the auto-instrumentation aspects.

However, if there's a good reason / value / need to provide certain level of enabling auto-instrumentation in terms of running operators (e.g. python operators), that may be a good discussion to start with..

So, generally, the Airflow community welcomes contributions of any types as long as those contributions have enough support (things are voted and approved), and the discussion has been made enough. If something sounds like a good idea (or you have found something) - please share with the community and then it can happen as an implementation work!

Regards, Howard

On Wed, Sep 25, 2024 at 8:33 AM mxmrlt @.***> wrote:

Glad to read it.

I see you're the creator of airflow_otel_provider. Do you know if there is a way to auto-instrument our dags so that spans are created and exported for any custom code like requests calls or kafka publishing etc...? OpenTelemetry should permit that if I'm right ( https://opentelemetry.io/docs/zero-code/python/) but perhaps I'm not.

This would save us from having to declare manually what you show in the README :

RequestsInstrumentor().instrument(tracer_provider=otel_hook.tracer_provider)

If you have any other advice on this topic please tell me.

Thank you

— Reply to this email directly, view it on GitHub https://github.com/apache/airflow/issues/42424#issuecomment-2374103765, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHZNLLQJY55TV6IDKRJRMD3ZYK3UJAVCNFSM6AAAAABOW5EZVOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZUGEYDGNZWGU . You are receiving this because you authored the thread.Message ID: @.***>