discourse / prometheus_exporter

A framework for collecting and aggregating prometheus metrics
MIT License
525 stars 153 forks source link

Puma: getting only worker related metrics #239

Open vitobotta opened 2 years ago

vitobotta commented 2 years ago

Hi!

I have configured the Puma instrumentation as per the README but I only see the worker related metrics. I would need to see puma_request_backlog for example but it doesn't show up in Prometheus. What am I missing? Thanks!

ps. This is in my puma.rb

on_worker_boot do
  if Rails.env.production?
    require 'prometheus_exporter/instrumentation'

    PrometheusExporter::Instrumentation::ActiveRecord.start(
      custom_labels: { type: "puma_worker" }, #optional params
      config_labels: [:database, :host] #optional params
    )

    PrometheusExporter::Instrumentation::Process.start(type: "web")
    PrometheusExporter::Instrumentation::Puma.start
  end
end
NickLarsenNZ commented 1 year ago

I have the opposite problem. I'm getting the main metrics, but not the worker metrics.

My puma.rb looks like yours, but I also have some configuration in an initializer to run the server and start the process for the "main" metrics:

initializers/prometheus.rb

if Rails.env == "production"
    require 'prometheus_exporter/server'
    require 'prometheus_exporter/client'
    require 'prometheus_exporter/instrumentation'
    require 'prometheus_exporter/middleware'

    # This reports stats per request like HTTP status and timings
    Rails.application.middleware.unshift PrometheusExporter::Middleware

    server = PrometheusExporter::Server::WebServer.new bind: 'localhost', port: ENV.fetch("METRICS_PORT") { 9090 }
    server.start

    # # wire up a default local client
    PrometheusExporter::Client.default = PrometheusExporter::LocalClient.new(collector: server.collector)

    # this reports basic process stats like RSS and GC info
    PrometheusExporter::Instrumentation::Process.start(type: "main")

end

Where do you see your web worker metrics? Do you also run the metrics server like I do?

vitobotta commented 1 year ago

Hi @NickLarsenNZ and sorry for the delay, I never received a notification of your reply and I only noticed it now because I am still investigating this issue. I see only some metrics in Prometheus, specifically workers, old workers and booted workers. The one I am most interested in is the backlog and I never got that metric to show up. I am not sure if I am still missing some configuration....

In the initializer I have this:

if Rails.env.production?
  require 'prometheus_exporter/middleware'
  require 'prometheus_exporter/server'
  require 'prometheus_exporter/client'
  require 'prometheus_exporter/instrumentation'

  Rails.application.middleware.unshift PrometheusExporter::Middleware
end

I'm gonna add the start line in there as you do, perhaps that will give me the backlog?

vitobotta commented 1 year ago

It seems I can get the request backlog if I use before_fork instead of on_worker_boot, which one are you using?

sosedoff commented 1 year ago

@vitobotta You need to place your instrumentation code into after_worker_boot section if running Puma in clustered mode.

NickLarsenNZ commented 1 year ago

For anyone trying to get it working generally (for both standalone and clustered mode), I've done the following, but have not tested it properly (this was from months ago), but maybe it helps.

config/puma.rb

  if ENV.fetch("PROMETHEUS_EXPORTER_ENABLED", "false").downcase == "true"
    # Puma is a bit weird with configuration between standalone and clustered mode.
    # When workers is 0, then the main process does the threading.
    # But there are no hooks for configuring that. Maybe we should force at least
    # one worker to make use of the after_worker_boot hook

    # we have to wrap this if workers are 0 or undefined, else we miss the following metrics:
    #    HELP puma_running_threads Number of puma threads currently running.
    #    TYPE puma_running_threads gauge
    #    puma_running_threads{phase="0",app="rails-metrics"} 10
    #    puma_request_backlog{phase="0",app="rails-metrics"} 0
    #    puma_thread_pool_capacity{phase="0",app="rails-metrics"} 10
    #    puma_max_threads{phase="0",app="rails-metrics"} 10
    if @options.fetch(:workers) { 0 } == 0
      PrometheusExporter::Instrumentation::Process.start(type: "main")
      # E, [2022-09-06T12:13:59.369144 #76882] ERROR -- : PrometheusExporter::Instrumentation::Puma Prometheus Exporter Failed To Collect Stats undefined method `stats' for nil:NilClass
      # Issue: https://github.com/puma/puma/issues/1230
      # PR: https://github.com/puma/puma/pull/2709
      # Still seems to work, so leaving in here.
      PrometheusExporter::Instrumentation::Puma.start(frequency: 1)
      PrometheusExporter::Instrumentation::ActiveRecord.start(
        custom_labels: { type: "puma_standalone_mode" }, #optional params
        config_labels: [:database, :host] #optional params
      )
    end

    after_worker_boot do
      PrometheusExporter::Instrumentation::Process.start(type: "web")

      PrometheusExporter::Instrumentation::ActiveRecord.start(
        custom_labels: { type: "puma_clustered_mode" }, #optional params
        config_labels: [:database, :host] #optional params
      )
      # if this is started outside after_worker_boot, then some metrics disappear
      if !PrometheusExporter::Instrumentation::Puma.started?
        PrometheusExporter::Instrumentation::Puma.start(frequency: 1)
      end
    end

  end

config/prometheus.rb

Note: I'm unsure if it's ok to do the PrometheusExporter::Instrumentation::Process.start again when it's done in puma.rb also. But think it is needed to get Delayed Job metrics (in which case you need to avoid running the local server, and send to a separate prometheus_exporter process.

    if ENV.fetch("PROMETHEUS_EXPORTER_ENABLED", "false").downcase == "true"
        require 'prometheus_exporter/server'
        require 'prometheus_exporter/client'
        require 'prometheus_exporter/instrumentation'
        require 'prometheus_exporter/middleware'

        # This reports stats per request like HTTP status and timings
        Rails.application.middleware.unshift PrometheusExporter::Middleware

        # You can run the local server
        if ENV.fetch("PROMETHEUS_EXPORTER_LOCAL_SERVER_ENABLED", "false").downcase == "true"
            server = PrometheusExporter::Server::WebServer.new(
                bind: ENV.fetch("PROMETHEUS_EXPORTER_HOST") { PrometheusExporter::DEFAULT_BIND_ADDRESS },
                port: ENV.fetch("PROMETHEUS_EXPORTER_PORT") { PrometheusExporter::DEFAULT_PORT }
            )
            server.start

            # wire up a default local client
            PrometheusExporter::Client.default = PrometheusExporter::LocalClient.new(collector: server.collector)
        end

        PrometheusExporter::Metric::Base.default_labels = { "app" => "cymonz-web" }

        # this reports basic process stats like RSS and GC info
        PrometheusExporter::Instrumentation::Process.start(type: "main")
    end
achilles-tee commented 10 months ago

After tested in my env, I found that if using after_worker_boot, can only see the master's ruby_rss. In order to see the worker's ruby_rss, you can try to use the on_worker_boot.

I amended a bit from @NickLarsenNZ version, config/puma.rb

require 'prometheus_exporter/instrumentation'

  if @options.fetch(:workers) { 0 } == 0
    PrometheusExporter::Instrumentation::Process.start(type: "master")
    PrometheusExporter::Instrumentation::Puma.start(frequency: 1)

    PrometheusExporter::Instrumentation::ActiveRecord.start(
      custom_labels: { type: "puma_single_mode" }, #optional params
      config_labels: [:database, :host] #optional params
    )
  else
    PrometheusExporter::Instrumentation::Process.start(type: "main")
    PrometheusExporter::Instrumentation::Puma.start(frequency: 1)
  end  

  on_worker_boot do
    PrometheusExporter::Instrumentation::Process.start(type: "web")

    PrometheusExporter::Instrumentation::ActiveRecord.start(
      custom_labels: { type: "puma_clustered_mode" }, #optional params
      config_labels: [:database, :host] #optional params
    )
    # if this is started outside after_worker_boot, then some metrics disappear
    if !PrometheusExporter::Instrumentation::Puma.started?
      PrometheusExporter::Instrumentation::Puma.start(frequency: 1)
    end
  end

The reason why the master for single mode, main for cluster mode, it is because the grafana dashboard I used, the web workers querying the master|web|puma_master|puma_worker, I didn't want the master to be count as a worker when cluster mode but I did want to see the rest of the metrics of the master in the dashboard. So when in single mode, the master can be count as a web worker

mildred commented 10 months ago

I could get it to work. For those interested:

config/puma.rb ```rb # frozen_string_literal: true require 'dotenv' require 'env_bang' require "#{File.absolute_path(__dir__)}/env_dotenv.rb" require "#{File.absolute_path(__dir__)}/env_puma.rb" # Puma can serve each request in a thread from an internal thread pool. # The `threads` method setting takes two numbers: a minimum and maximum. # Any libraries that use thread pools should be configured to match # the maximum value specified for Puma. Default is set to 5 threads for minimum # and maximum; this matches the default thread size of Active Record. # max_threads_count = ENV!['RAILS_MAX_THREADS'] min_threads_count = ENV!['RAILS_MIN_THREADS'] threads min_threads_count, max_threads_count # Specifies the `port` that Puma will listen on to receive requests; default is 3000. # port ENV.fetch('PORT', 3000) # Specifies the `environment` that Puma will run in. # environment ENV!['RAILS_ENV'] # Specifies the `pidfile` that Puma will use. pidfile ENV.fetch('PIDFILE', 'tmp/pids/server.pid') # Specifies the number of `workers` to boot in clustered mode. # Workers are forked web server processes. If using threads and workers together # the concurrency of the application would be max `threads` * `workers`. # Workers do not work on JRuby or Windows (both of which do not support # processes). # # workers ENV.fetch("WEB_CONCURRENCY") { 2 } num_workers = ENV!['RAILS_NUM_WORKERS'] $puma_cluster_mode = num_workers > 1 workers num_workers if $puma_cluster_mode # Use the `preload_app!` method when specifying a `workers` number. # This directive tells Puma to first boot the application and load code # before forking the application. This takes advantage of Copy On Write # process behavior so workers use less memory. preload_app! if $puma_cluster_mode # Allow puma to be restarted by `rails restart` command. plugin :tmp_restart def control_app(listen: true) return unless ENV![:PUMA_CONTROL] listen = false if ENV![:PUMA_CONTROL_LISTEN].nil? || ENV![:PUMA_CONTROL_LISTEN] == '' if listen activate_control_app(ENV![:PUMA_CONTROL_LISTEN], { auth_token: ENV![:PUMA_CONTROL_TOKEN], no_token: ENV![:PUMA_CONTROL_TOKEN_DISABLE] }) else activate_control_app end plugin :yabeda end def start_prometreus(worker_name) pid = Process.pid PrometheusExporter::Metric::Base.default_labels = { 'worker' => worker_name, 'pid' => pid } PrometheusExporter::Instrumentation::ActiveRecord.start( custom_labels: { worker: worker_name, 'pid' => pid }, # optional params config_labels: %i[host] # optional params ) # this reports basic process stats like RSS and GC info PrometheusExporter::Instrumentation::Process.start(labels: { 'worker' => worker_name, 'pid' => pid }) PrometheusExporter::Instrumentation::Puma.start(labels: { 'worker' => worker_name, 'pid' => pid }) #unless PrometheusExporter::Instrumentation::Puma.started? end if $puma_cluster_mode control_app on_worker_boot do |i| $puma_cluster_mode = true $puma_worker_index = i # control_app(listen: false) end after_worker_boot do |i| require 'prometheus_exporter' require 'prometheus_exporter/metric' require 'prometheus_exporter/instrumentation' start_prometreus("worker#{i}") end else control_app on_booted do require 'prometheus_exporter' require 'prometheus_exporter/metric' require 'prometheus_exporter/instrumentation' start_prometreus('main') end end ``` Note: Yabeda and control app do not work and they are disabled by default behing an environment variable flag. The control app is useful to get thread stack traces in dev but Yabeda is probably of no use.

highlights:

It all seems to work. probably the doc needs improving though.