elastic / apm-agent-ruby

https://www.elastic.co/solutions/apm
Apache License 2.0
168 stars 131 forks source link

Puma Instrumentation #1193

Open jclusso opened 2 years ago

jclusso commented 2 years ago

Is there any plan to add Puma instrumentation so we can measure things like queue request time. This is pretty standard in almost every other APM by just adding a header in NGINX.

jclusso commented 2 years ago

@mikker any idea when this will come?

mikker commented 2 years ago

I no longer work for Elastic on the agent. Ping @estolfo (hope you're well!)

estolfo commented 2 years ago

Hi @jclusso, thanks for following-up. We don't have this planned now but I'll update this issue when we do add it to our roadmap.

jaggederest commented 2 years ago

Just to update with an approximate timeline, I think we can fit this into the next quarter or so. Will update in July if that doesn't turn out to be the case 👍

jclusso commented 2 years ago

@jaggederest that would be awesome! Figured this would be worth sharing. AppSignal does it by having you add the following to your NGINX config.

proxy_set_header X-Request-Start "t=${msec}";

https://docs.appsignal.com/ruby/instrumentation/request-queue-time.html#nginx

jclusso commented 2 years ago

@jaggederest I was wondering if in the meantime I could do this myself with custom middleware. I saw someone do it with datadog and some simple middleware. The one thing I don't know and can't figure out is how to report something to the APM as part of the request. I know how to do custom instrumentation but is there any documentation on how to just report something taking a certain amount of time on the request? That is the last piece of the puzzle. Would also be happy to make a pull request if I can get it working.

jaggederest commented 2 years ago

Honestly I'm in the same boat getting familiar with this project. I've implemented middleware for this kind of timing before but I'm not familiar enough with Elastic's internal reporting format to say. I think it will be in the spirit of a Span or Transaction adjustment. I encourage you to take a crack at it and see what you discover, but don't spin your wheels for too long if you get stuck. Very happy to accept PRs, even half baked ones are much appreciated, we can work through it if you get part of the way.

jclusso commented 2 years ago

Been playing around with something like This. I got this from someone's example of one for Datadog. I'm not sure if I should make it it's own transaction or span. What I have realized is I think the Span class needs to be modified to allow you to set duration unless I've missed something. Gonna keep playing around with this.

class QueueTimeMiddleware

  def initialize(app)
    @app = app
  end

  def call(env)
    start = env['HTTP_X_REQUEST_START'].to_f
    wait = env['puma.request_body_wait']
    current = Time.now.to_f

    ElasticAPM.start_span 'puma', 'queue_time'
    span = ElasticAPM.current_span
    duration = (current - wait - start).to_i
    span.duration = duration
    ElasticAPM.end_span

    @status, @headers, @response = @app.call(env)
    [@status, @headers, self]
  end

  def each(&block)
    @response.each(&block)
  end

end
jaggederest commented 2 years ago

Not that I strictly recommend this, but there's always instance_variable_set or instance_eval to dig into the span duration. You'll need to do that directly after the end_span call, and it may result in strange things in the UI, so very experimental. Over time I will probably add a method to allow you to pass in a historical start to start_span but I don't have firm ideas about what that will look like at the moment. Thanks for playing with it, this will be useful feedback.

jclusso commented 2 years ago

@jaggederest I almost feel like you just need another method entirely that is called like track_span that takes duration. It seems odd that you'd allow someone to start / end a span in future where you can then change the start time. I also imagine there are other good use cases for just being able to submit time for an already measured span.

jaggederest commented 2 years ago

Development note:

steve21168 commented 1 year ago

+1 for this feature

jclusso commented 1 year ago

@jaggederest curious when this will be available since it says it's for 8.4 and we're on 8.5.2 now.

estolfo commented 1 year ago

Hi @jclusso we don't have this planned right now but looking at your comment here, if you'd like to open a pull request, I'm happy to review it and work with you on it.

jclusso commented 1 year ago

@estolfo I'm not really sure where it belongs and there is nothing really like it to model off of. Can you advise? We currently have this working where we just send it to StatsD. I'm just not sure how to integrate this.

class PumaLatencyMiddleware

  def initialize(app)
    @app = app
  end

  def call(env)
    start   = env['HTTP_X_REQUEST_START']&.sub('t=', '').to_f * 1000
    wait    = env['puma.request_body_wait']
    current = Time.now.to_f * 1000

    StatsD.histogram(
      'puma.queue_time',
      (current - wait - start).to_i
    ) unless start == 0

    @app.call(env)
  end

end
jclusso commented 1 year ago

Hey, checking in on this again regarding my last comment.

niltonvasques commented 1 year ago

Up! Queue time is one of the most important metrics to check currently servers capacity. Is there a way to report a fixed time in a span?

niltonvasques commented 1 year ago

@jclusso I figured out a way to register the queue time.

Maybe it could also be useful for you: https://github.com/niltonvasques/apm-agent-ruby/commit/c66819bd21d92755037213f5ef0a269ecc7e2a9c

I will not open a pull request for that, because this strategy seems way too specific. Here in the gem a more general solution would makes more sense.

image

image

The only thing that I was not able to display, it is the internal transaction chart. For some reason the span type is not showed there and I don't know why:

image

jclusso commented 1 year ago

@niltonvasques this looks awesome. It seems like it's grouping it in the controller on the Time spent by span chart based on the color. I'm going to dig into this a little more and see if I can't take what you've done and get a PR working for this.

jclusso commented 10 months ago

@estolfo / @jaggederest was wondering if there is any news on support for this or a timeline? While the code @niltonvasques wrote works, I don't see how his implementation could be used to build latency alerts and the like.