Open balrawi-figma opened 1 year ago
@htuch @mattklein123 @zuercher @alyssawilk @wbpcode for any comments
I think configurable queuing would be a completely reasonable feature. I'd say the first step would be to factor out the existing queue an extension with clean APIs, defaulting to the current mechanism, then add an extension point and add your algorithms as an extension, potentially contrib if you can't find a sponsor per the extension guidelines.
@balrawi-figma I can help out with the effort.
Title: Support a configurable queue for pending requests to use adaptive LIFO and CoDel
Description: There is no way today that I'm aware of to use different queuing strategies for pending requests for Envoy. The issue here is to have a way to configure the pending requests queue in Envoy.
Why is this important? In an overloaded situation, using a simple FIFO queue impacts the p50 and p75 significantly (Some data from testing comparing different strategies are shared below).
Using controlled-delay (aka CoDel) has been a standard practice in the linux kernel. It's also been a well proven strategy to combine codel with an adaptive LIFO queue to mitigate request delays during an overloaded situation. Below are some numbers that I did using a draft PR that shed some light regarding the improvements in p50 and p75 of request latencies across different configurations.
Relevant Links: Related issue: https://github.com/envoyproxy/envoy/issues/9606 A draft PR for a possible implementation: https://github.com/envoyproxy/envoy/pull/28982
Testing
Setup Test an upstream with a controlled, random 'sleep' by sleeping between 0-3 seconds to simulate an overloaded situation saturating upstream capacity.
Test 1: Existing FIFO queue in Envoy:
Test 2: Adaptive LIFO queue in Envoy (per PR):
Test 3: Adaptive LIFO queue + CoDel in Envoy (per PR):
CoDel target delay is 5ms target delay and 100ms interval