discourse / prometheus_exporter

A framework for collecting and aggregating prometheus metrics
MIT License
539 stars 156 forks source link

Delayed Job metrics don't get published if using LocalClient #245

Closed NickLarsenNZ closed 2 years ago

NickLarsenNZ commented 2 years ago

I'm having trouble exposing delayed_job metrics. I would like to use the built in client (LocalClient) and a server, rather than running the server in a separate process.

Here is how I've setup:

Dummy articles controller to trigger a delayed job.

class ArticlesController < ApplicationController
  DummyJob = Struct.new(:text) do
    def perform
      puts text
    end
  end

  def index

    Delayed::Job.enqueue DummyJob.new('lorem ipsum...')
  end
end

config/initializers/prometheus.rb

unless Rails.env == "test"
    require 'prometheus_exporter/server'
    require 'prometheus_exporter/client'
    require 'prometheus_exporter/instrumentation'
    require 'prometheus_exporter/middleware'

    # This reports stats per request like HTTP status and timings
    Rails.application.middleware.unshift PrometheusExporter::Middleware

    server = PrometheusExporter::Server::WebServer.new bind: 'localhost', port: ENV.fetch("METRICS_PORT") { 9090 }
    server.start

    # wire up a default local client
    PrometheusExporter::Client.default = PrometheusExporter::LocalClient.new(collector: server.collector)
    PrometheusExporter::Metric::Base.default_labels = { "app" => "rails-metrics" }

    # this reports basic process stats like RSS and GC info
    PrometheusExporter::Instrumentation::Process.start(type: "main")
end

Run Delayed Job

METRICS_PORT=9090 rvm 2.7 do rake jobs:work

Run the web server:

METRICS_PORT=9091 rvm 2.7 do bundle exec rails s

Trigger the delayed job by browsing the app

http://localhost:3000/articles/index

I see console output from Delayed Job showing the job completed

[Worker(host:pathogen pid:6458)] Starting job worker
[Worker(host:pathogen pid:6458)] Job ArticlesController::DummyJob (id=8) RUNNING
lorem ipsum...
[Worker(host:pathogen pid:6458)] Job ArticlesController::DummyJob (id=8) COMPLETED after 0.0072
[Worker(host:pathogen pid:6458)] 1 jobs processed at 9.7363 j/s, 0 failed

Then check both apps metrics ports

No delayed_job metrics appear (I do see other metrics).

It works with the external prometheus_exporter

Now, if I remove the server and client from config/initializers/prometheus.rb, and run the exporter as a separate process:

rvm 2.7 do bundle exec prometheus_exporter -b 0.0.0.0
2022-09-06 14:27:04 +0200 Starting prometheus exporter on 0.0.0.0:9394

When I visit http://localhost:9394/metrics, I can see the delayed_job metrics:

# HELP ruby_delayed_job_duration_seconds Total time spent in delayed jobs.
# TYPE ruby_delayed_job_duration_seconds counter
ruby_delayed_job_duration_seconds{queue_name="",job_name=""} 4.300009459257126e-05

# HELP ruby_delayed_jobs_total Total number of delayed jobs executed.
# TYPE ruby_delayed_jobs_total counter
ruby_delayed_jobs_total{queue_name="",job_name=""} 1

# HELP ruby_delayed_failed_jobs_total Total number failed delayed jobs executed.
# TYPE ruby_delayed_failed_jobs_total counter

# HELP ruby_delayed_jobs_max_attempts_reached_total Total number of delayed jobs that reached max attempts.
# TYPE ruby_delayed_jobs_max_attempts_reached_total counter

# HELP ruby_delayed_job_duration_seconds_summary Summary of the time it takes jobs to execute.
# TYPE ruby_delayed_job_duration_seconds_summary summary
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",quantile="0.99"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",quantile="0.9"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",quantile="0.5"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",quantile="0.1"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",quantile="0.01"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary_sum{queue_name="",job_name=""} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary_count{queue_name="",job_name=""} 1
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",status="success",quantile="0.99"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",status="success",quantile="0.9"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",status="success",quantile="0.5"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",status="success",quantile="0.1"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary{queue_name="",job_name="",status="success",quantile="0.01"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary_sum{queue_name="",job_name="",status="success"} 4.300009459257126e-05
ruby_delayed_job_duration_seconds_summary_count{queue_name="",job_name="",status="success"} 1

# HELP ruby_delayed_job_attempts_summary Summary of the amount of attempts it takes delayed jobs to succeed.
# TYPE ruby_delayed_job_attempts_summary summary
ruby_delayed_job_attempts_summary{queue_name="",job_name="",quantile="0.99"} 1.0
ruby_delayed_job_attempts_summary{queue_name="",job_name="",quantile="0.9"} 1.0
ruby_delayed_job_attempts_summary{queue_name="",job_name="",quantile="0.5"} 1.0
ruby_delayed_job_attempts_summary{queue_name="",job_name="",quantile="0.1"} 1.0
ruby_delayed_job_attempts_summary{queue_name="",job_name="",quantile="0.01"} 1.0
ruby_delayed_job_attempts_summary_sum{queue_name="",job_name=""} 1.0
ruby_delayed_job_attempts_summary_count{queue_name="",job_name=""} 1

# HELP ruby_delayed_jobs_enqueued Number of enqueued delayed jobs.
# TYPE ruby_delayed_jobs_enqueued gauge
ruby_delayed_jobs_enqueued{queue_name=""} 1

# HELP ruby_delayed_jobs_pending Number of pending delayed jobs.
# TYPE ruby_delayed_jobs_pending gauge
ruby_delayed_jobs_pending{queue_name=""} 0

I notice that these metrics were published even before I stopped and started delayed_job with the exporter server and client removed - so I assume the metrics were populated from code running from the rails server.

NickLarsenNZ commented 2 years ago

I have managed to get metrics to appear in the right place using the LocalClient. I had to set PROMETHEUS_EXPORTER_PORT to be the same as METRICS_PORT.

PROMETHEUS_EXPORTER_PORT=9090 METRICS_PORT=9090 rvm 2.7 do rake jobs:work

I think that means I shouldn't set a server port in the initializer, and control both the server and client port through PROMETHEUS_EXPORTER_PORT.

I have to run, but I'll try it tonight, and if that works, I'll comment and close the issue.

NickLarsenNZ commented 2 years ago

It seems PROMETHEUS_EXPORTER_PORT is only for the Client. The server needs to have the port configured, or it falls back to PrometheusExporter::DEFAULT_PORT.

Client:

Note: localhost is used instead of PrometheusExporter::DEFAULT_BIND_ADDRESS.

https://github.com/discourse/prometheus_exporter/blob/cef352980020cb9d764b429191aa941c9ce93d9e/lib/prometheus_exporter/client.rb#L60-L61

Server:

https://github.com/discourse/prometheus_exporter/blob/27a768932b81bb6308be761468dbb16c6c55ab0c/lib/prometheus_exporter/server/web_server.rb#L13-L14

I think the Server should read the env var like the Client does, but until that happens, here is how I work around it in my initializer:

server = PrometheusExporter::Server::WebServer.new(
    bind: ENV.fetch("PROMETHEUS_EXPORTER_HOST") { PrometheusExporter::DEFAULT_BIND_ADDRESS }, 
    port: ENV.fetch("PROMETHEUS_EXPORTER_PORT") { PrometheusExporter::DEFAULT_PORT }
)
server.start

PrometheusExporter::Client.default = PrometheusExporter::LocalClient.new(collector: server.collector)