akoutmos / prom_ex

An Elixir Prometheus metrics collection library built on top of Telemetry with accompanying Grafana dashboards
MIT License
596 stars 104 forks source link

[BUG] Oban metric [:prom_ex, :plugin, :oban, :queue, :length, :count] is not fetching queue states when length is 0 #202

Open linqueta opened 1 year ago

linqueta commented 1 year ago

Describe the bug At my company when creating a chart for when some queue gets zero jobs executing after some time I faced that Oban is not sending when a queue state reaches the number 0

Running the same query that we run to fetch queues grouped by state I could see a pattern like this:

Example: Queues: create_order and deliver_order

After creating an order the job gets completed and triggered to deliver, so the response of the query will be:

[ 
  {"create_order", "executing", 102},
  {"create_order", "completed", 16087},
  {"create_order", "discarded", 3030},
  {"deliver_order", "executing", 2535},
  {"deliver_order", "discarded", 116}
]

After some time my program will finish the creation and I'll have:

[ 
  {"create_order", "completed", 16187},
  {"create_order", "discarded", 3030},
  {"deliver_order", "executing", 2535},
  {"deliver_order", "discarded", 116}
]

So, I'm interested to understand when the queue create_order in the state executing gets 0 but it's not possible since this metric is implemented as last_value (implementation) and we are keeping the last value of the last pooling round (ex: 5 seconds before finishing the creation the value was 10).

To Reproduce Steps to reproduce the behavior:

  1. Add the following plugins to PromEx: Oban
  2. Create some application with one queue at least (could be the queue default)
  3. Create a job with a sleep of 5 seconds for this queue and trigger it once
  4. After 15 seconds (to force overlap one polling window) check in the /metrics about the state of the queue default for the state executing and you will see the value 1 even the job has the state completed

Expected behavior I expected that for all possible Oban Job states ([:scheduled, :available, :executing, :retryable, :cancelled, :completed, :discarded]) the PromEx Oban plugin sets as 0 if the state wasn't found into the database, avoiding the wrong value set into the last_value metric.

Environment

Additional context

linqueta commented 1 year ago

Suggesting a possible solution:

After fetching all queue states from the database, using the function Oban.states() for each queue we can set as 0 for states didn't find, for example, for the queue create_order we se the state executing as 0 after don't have any job executing.

I can open a PR if it's reasonable.