discourse / prometheus_exporter

A framework for collecting and aggregating prometheus metrics
MIT License
525 stars 153 forks source link

Add delayed_jobs_ready to DelayedJobs plugin and collect_by_queue option for GoodJob plugin #302

Open benngarcia opened 5 months ago

benngarcia commented 5 months ago

This PR was born out of metric gathering for our auto-scaling needs as we're migrating hosting platforms and mid-migration of queue libraries.

This PR adds a new metric to the DelayedJobs plugin - "delayed_jobs_ready". This can be thought of as all of the jobs whose run_at < now(). We needed this metric and not queued or pending since those included all of our jobs which could be days, weeks, or months out.

This PR also adds the ability to view GoodJob metrics sliced by queue, similar to the DelayedJobs plugin. It's fairly self-explanatory why scaling queue workers based off how many jobs are enqueued in a given queue may be beneficial.

lauer commented 5 months ago

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts? I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

benngarcia commented 5 months ago

One question, how would you handle that the number of jobs returned can decrease, because of clean up scripts? I am using the GoodJob part now, but since is a total count in the DB, the number is really unusable, when the clean up script is running next to it.

I'm not sure I fully understand the question/problem statement here - any clarification would be great :D

If you're asking about GoodJob's auto-clean up which deleted jobs after X amount of time (default 2 weeks) you can either disable the clean-up and leave the records in the db or implement some good_job on delete hook to increment some counter somewhere. Though, I'm not sure if that's within the scope of the prometheus_exporter gem, or my PR, so maybe I'm misunderstanding the question 😅