DataDog / dd-trace-rb

Datadog Tracing Ruby Client
https://docs.datadoghq.com/tracing/
Other
305 stars 369 forks source link

Middleware auto instrumentation #3483

Open beauraF opened 6 months ago

beauraF commented 6 months ago

Hello everyone 👋

Middleware over the long term can have a significant impact on latency. I think it would be a good addition to materialize the time spent on it. Especially, as it's not inferable by making a difference between rack.request and rails.action_controller because of https://github.com/DataDog/dd-trace-rb/issues/2100.

I'm currently trying to implement this in our codebase, and so far I've come up with 3 ideas:

  1. Leverage process_middleware.action_dispatch to wrap middleware call
  2. Leverage process_middleware.action_dispatch and process_action.action_controller to create two spans that sum the before process_action middleware execution time and the after process_action middleware execution time
  3. Leverage process_middleware.action_dispatch and process_action.action_controller to create two (before and after process_action) dedicated span per middleware

For 1. the implementation is simple, but this create a huge pyramid. Flamegraphs become complicated to navigate. And you need to do the difference yourself to infer real execution time. But in opt-in mode it's a start to aggregate some data on a small amount of time.

For 2 and 3, Implementation is a little more complicated, but it's the best I've come up with so far.

I'd love to hear your feedback, suggestions and ideas about all this. 🙏

beauraF commented 6 months ago

We implemented option 2 in our codebase, but instead of using active support event we used two middlewares. A first one insert just after Datadog::Tracing::Contrib::Rack::TraceMiddleware and one at the end of the middleware stack.

Code looks like

module MiddlewareAutoInstrumentation
  BEFORE_SPAN = 'rack.request.middleware.before'
  AFTER_SPAN = 'rack.request.middleware.after'

  class In
    def initialize(app)
      @app = app
    end

    def call(env)
      Thread.current[BEFORE_SPAN] = ::Datadog::Tracing.trace(BEFORE_SPAN)
      @app.call(env)
    ensure
      Thread.current[BEFORE_SPAN].try(:finish)
      Thread.current[AFTER_SPAN].try(:finish)
    end
  end

  class Out
    def initialize(app)
      @app = app
    end

    def call(env)
      Thread.current[BEFORE_SPAN].finish
      result = @app.call(env)
      Thread.current[AFTER_SPAN] = ::Datadog::Tracing.trace(AFTER_SPAN)

      result
    end
  end
end

And look like that in datadog:

image

Let me know if it's something you're interested in, we'd love to have it upstream 🙏