discourse / prometheus_exporter

A framework for collecting and aggregating prometheus metrics
MIT License
537 stars 155 forks source link

Redis instrumentation can cause infinite stack errors when combined with other instrumentation. #159

Closed treyhyde closed 2 years ago

treyhyde commented 3 years ago

When combined with the opentelemetry ruby gem, this instrumentation will cause an infinite stack and failure.

According to some, instrumenting with prepend would help various instrumentation play nicer together. Alternatively or additionally, ability to disable various instrumentation (Redis, Pg, etc) would be of benefit. In this exact case, this exporter is the more important of the two modules... but of the redis, db, etc instrumentation, the opentelemetry version would be my preference.

See https://github.com/open-telemetry/opentelemetry-ruby-contrib/issues/662 https://github.com/open-telemetry/opentelemetry-ruby/issues/352 https://github.com/elastic/apm-agent-ruby/issues/379#issuecomment-717341248

etc

SamSaffron commented 3 years ago

It is tricky, we went through a similar drill with rack-mini-profiler. prepend helps if everyone uses prepend but it can break stuff as well in other edge cases.

My call here would be just to add switches to disable/enable bits of instrumentation you want and opt for the style of patching you prefer.

Happy to accept a PR here if you want to work on it.

https://github.com/MiniProfiler/rack-mini-profiler/pull/444

https://github.com/MiniProfiler/rack-mini-profiler/issues/437

Note instrumentation is already optional in that you have to require 'prometheus_exporter/instrumentation' but we could have a second require 'prometheus_exporter/instrumentation_prepend' or something like that for cross compatibility.

treyhyde commented 3 years ago

Yeah, an option would be great. The instrumentation is super valuable but I would prefer some finer grained controls rather than all on or all off. Of course, I wouldn't care as much if it worked for my use case rather than pursue the micro-optimization and loss of metrics. Really, the prepend option (theoretically) solves all my problems.

Thanks team, we love this gem.

ball-hayden commented 3 years ago

prometheus_exporter conflicting with elastic-apm is causing us grief - we're likely going to have to drop one or the other unless there's a sensible workaround (which would be a shame, because they both bring different value).

I guess a workaround for the timebeing is to monkeypatch (ironically using prepend) https://github.com/discourse/prometheus_exporter/blob/e9511e40f47a8aaa4e94469955af802590133971/lib/prometheus_exporter/middleware.rb to not bind to Redis?

SamSaffron commented 3 years ago

no need to monkey patch, you can just disable automatic instrumentation and write these lines yourself:

https://github.com/discourse/prometheus_exporter/blob/e9511e40f47a8aaa4e94469955af802590133971/lib/prometheus_exporter/middleware.rb#L14-L26

I am fine with a finer grained "how to monkey patch various things, switch" but I would like whoever is going to work on this to start off with a quick API spec.

bcharna commented 3 years ago

I'm experimenting with opentelemetry-ruby and prometheus_exporter and faced the same issue.

As discussed in depth in this New Relic blog post, a workaround is to have Module#prepend happen after alias_method when two libraries instrument code. Since opentelemetry-ruby uses Module#prepend, its instrumentation needs to be initialized after prometheus_exporter's like so:

class Application < Rails::Application
  OpenTelemetry::SDK.configure do |c|
    # Cannot instrument Rails in `config.after_initialize` as it modifies middleware;
    # would trigger a `can't modify frozen Array` error.
    c.use 'OpenTelemetry::Instrumentation::Rails'
  end

  config.after_initialize do
    OpenTelemetry::SDK.configure do |c|
      c.use 'OpenTelemetry::Instrumentation::PG'
      c.use 'OpenTelemetry::Instrumentation::Redis'
    end
  end
end

This works without issue in development, but have not tested in production.

@SamSaffron I may be interested in coming up with a PR to help with this issue depending on my bandwidth. You mentioned the idea of supporting require 'prometheus_exporter/instrumentation_prepend', and I'm assuming that that file would ensure that a special version of MethodProfiler.patch is called that uses Module#prepend instead of alias_method. Is that roughly what you had in mind?

SamSaffron commented 3 years ago

I guess start off with implementing a patch to MethodProfiler that adds a prepend kwarg:

def self.patch(klass, methods, name, prepend: false)

Then we change these lines so we support the special directive https://github.com/discourse/prometheus_exporter/blob/master/lib/prometheus_exporter/middleware.rb#L13-L27

 MethodProfiler.patch(Redis::Client, [:call, :call_pipeline], :redis, prepend: config[:instrument] == :prepend)
slhck commented 3 years ago

For someone who is not very experienced with the inner workings of this gem or elastic-apm, what is a simple solution to this error? I cannot boot Sidekiq now with APM 4.3 and Prometheus Exporter 0.8.1 due to a similar bug.

cdesch commented 3 years ago

@SamSaffron Where exactly am should I be putting the MethodProfiler.patch(Redis::Client, [:call, :call_pipeline], :redis, prepend: config[:instrument] == :prepend) to workaround this issue with ElasticAPM? Does it go into the config/initializers/sidekiq.rb or the config/initializers/prometheus.rb ?

In this spot? config/initializers/prometheus.rb

unless Rails.env == "test"
  require 'prometheus_exporter/middleware'

  # This reports stats per request like HTTP status and timings
  Rails.application.middleware.unshift PrometheusExporter::Middleware
end

or here config/initializers/sidekiq.rb

Sidekiq.configure_server do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") {'redis://redis:6379/1'}}
  config.server_middleware do |chain|
    require 'prometheus_exporter/instrumentation'
    chain.add PrometheusExporter::Instrumentation::Sidekiq
  end
  config.death_handlers << PrometheusExporter::Instrumentation::Sidekiq.death_handler
end

Sidekiq.configure_client do |config|
  config.redis = { url: ENV.fetch("REDIS_URL") {'redis://redis:6379/1'} }
end
slhck commented 2 years ago

@cdesch Did you ever solve this? I would like to update to Elastic APM 4.x at some point, but I have no idea where to monkey patch it.

cdesch commented 2 years ago

@slhck I don't recall if I did or not. I think do think I got it working with Elastic APM. Unfortunately, I've moved on to a new project.

bcharna commented 2 years ago

hey @slhck, I added a PR to support choosing the style of method patching which is outlined in the README here. So you should no longer need to do any monkey patching.

@SamSaffron, mind closing this issue?

SamSaffron commented 2 years ago

sure, closing this off!

Cheers

slhck commented 2 years ago

Just wanted to come back here and say that this change in the prometheus_exporter.rb initializer did the job:

-  Rails.application.middleware.unshift PrometheusExporter::Middleware
+  Rails.application.middleware.unshift PrometheusExporter::Middleware, instrument: :prepend

This enabled me to launch Elastic APM 4.x. alongside this Gem.