Open thechile opened 7 years ago
Maybe the "max-queue-length exceeded" behavior could be softened to tail-drop instead of suicide?
Regarding the OP though, I think you should increase max-queue-length. 5000 ist just a conservative guess, it should be calibrated to however many queries you guesstimate you can drain in 2 seconds or so. Using overload-queue-length is also a good idea for real workloads; it's effectively a head-drop, which has better pushback signalling properties than tail-drop (or suicide).
This might appear to hurt a synthetic benchmark, because you'll get a bunch of dropped queries during warm-up. This points to a deeper issue with your benchmark, though. Unless you're specifically trying to measure how the software behaves during warm-up, the benchmark should discard any results from the warm-up phase and only measure the steady state.
thanks. I did look at overload-queue-length
for its head-drop properties but it was unclear when configured what happens when there is actually a problem with the backend and the large queue size persists. It would be good perhaps if i could configure max-queue-length=5000
and overload-queue-length=3000
and have a overload-queue-full-duration
option in ms that specifies how long the overload-queue-length option is in affect before overflowing to the value specified in max-queue-length. Then again i think it would also be good to have max-queue-full-duration
option.
Then i could use something like this
overload-queue-length=5000 # If queue reaches this value then serve from packet cache only
overload-queue-full-duration=5000 # .. but only for this duration(ms) before overflowing to max-queue-length
max-queue-length=50000 # Allow backend queue to reach this value
max-queue-full-duration=10000 # .. but only for this duration(ms) before killing pdns process
If there was a configurable so max-queue-length-action could be specified so either tail-drop or suicide could be specified.
thanks.
I would like to request a new feature flag with regards to the way max-queue-length works.
At the moment i'm performing some load tests on pdns and pdns-recursor. When testing recursor via resperf i quickly hit problems with the max-queue-length defaults of 5000. This is on a centos 7 server so what happens is within 2 seconds of starting the load test, pdns exists due to backlog of mysql backend traffic at which point the systemd unit immediately restarts the service. If i bump the value of max-queue-length to 50,000 then it works better in as much as the
qsize-q
value jump up to 30,000 for 1 or 2 seconds and then recover to 0 allowing the load test to complete.So at times i am aware that DNS traffic will be bursty and might overwhelm the backend but rather than just use the queue size also consider how long the queue size has been high.
I know the
overload-queue-length
option was added but i would prefer if i could specify a millisecond threshold that the max-queue-length has to be exceeded before the rather brutal killing of the process happens.thank you