AndyKwiat / sloqueue

0 stars 0 forks source link

SLO-based queues

For hackdays I created a queue simulator and experimented with different implementations of queues based on aging. Although I tried a bunch of different solutions, this document outlines what I think is the most promising one.

The Problem

Right now we have a lot of different queues with different priorities as Kir explained in his JobsDB Project. Tuning the performance characteristics of each queue is not trivial since there are multiple types of job workers each working on multiple job queues.

What we really want is to just tell the queue system a particular job's SLO of how long it is acceptable for it to wait in the queue. These could be similar to our current Jobs SLOs we monitor, for example:

Job Type SLO
payment 5 sec
default 30 sec
webhook 5 min

Algorithm (SLO based queues)

SLOs: payment = 5000ms, default= 30000ms, webhook=300000ms

In the above example, payment wait times go to about 5 seconds, default go to about 30 seconds, and webhook is sacrificed until the high load ends.

Benefits