RTT fairness vs. conventional ECT flows

chromi commented 5 years ago

One of the major theoretical concerns with TCP Prague, with respect to deployability in the public Internet, is what happens when the bottleneck is a conventional single-queue AQM supporting ECN. While such an AQM can probably still control a DCTCP-style ECN response sufficiently well, the sustained marking rates required to do so would cause any conventional CC algo sharing that link to collapse to minimum cwnd.

To mitigate this, the "TCP Prague Requirements" draft mentions a need to detect such conventional AQMs and switch to a conventional ECN response. However, I do not see any mechanism to do so in this implementation (only a fallback to conventional TCP if AccECN isn't negotiated, which is a completely different concern). Nor have I seen a detailed description of how such single-queue AQM detection would work.

Is there a plan to address this shortcoming?

oliviertilmans commented 5 years ago

As mentionned during tsvwg/ietf104, this was a low priority issue as we found no traces of single-queue AQMs that were ECN-enabled (ignoring PIE2), either when doing measurements, or during talks with operators. If such AQM was indeed existing at the bottleneck of a path, then scalable CCs (i.e., responding linearly to CE marks) would indeed starve non-scalable CCs (i;e., those responding quadratically).

We have a couple of plans to detect this, and will implement and test them over the coming weeks as time permits. In no particular order, hereafter are example signals that the bottleneck of a path may not be L4S aware, and should be stored in the kernel route cache:

RTT increase proportional to cwnd increase, esp. during slow start. Paced Chirping is a good candidate to help measure that more accurately, with no CEs. This indicates queue buildup, and would always trigger marks from a L4S AQM whereas others happily build up queues/increase delay.
No significant RTT difference when toggling between ECT(0) and ECR(1). The whole point of L4S is to isolate latency between scalable CCs (i;e., those that can saturate links without requiring a queue to be full) and other CCs (that require full buffers to saturate links), while keeping throughput fair. Not observing a latency difference means the isolation is not achieved, i.e., that we're mixed with classic flows. An use for this would be to tweak congestion avoidance to toggle ECT usage for successive windows. Note that this would cause suboptimal performance of TCP Prague with a PIE2 AQM (but as we're speaking here of detecting non-existant cases, facing PIE2 in the wild is even more hypothetical).
Repeated losses preceded by few CEs (if any). Note that a pure losses without any CE causes a reno-like response, unconditionally. Again, L4S CEs indicates queue building up, where a linear decrease should be sufficient. facing losses afterwards indicates that we're not getting such linear feedback (i.e., the AQM expects a quadratic response). This signal is however tedious to work with, as it would cause L4S flows to backoff 'too late'.

chromi commented 5 years ago

The trouble is that even if ECN-enabled single-queue AQMs are presently uncommon, we don't want to unilaterally exclude the possibility of this functionality being turned on in the near future. With that in mind, let me outline what I think the weaknesses of the above approaches are:

RTT increase proportional to cwnd increase

This sounds like it would detect a dumb FIFO with reasonable reliability, but TCP Prague already responds "correctly" to tail drops and is thus no less safe than other TCPs with this type of queue. Thus it is not necessary to specifically detect that type of bottleneck.

In a DualQ environment, cwnd increase beyond BDP would presumably cause spilling into the Classic queue, which works the same as a standard AQM. But presumably you distinguish the case where high marking rates occur, so as to avoid a false positive.

But a reasonably aggressive conventional AQM would also produce high marking rates very soon after a queue was established. This would include an instance of Codel configured for a "metro" or "LAN" environment with a shorter-than-standard 'interval' parameter. It's not clear how you would distinguish this case from a DualQ middlebox.

No significant RTT difference when toggling between ECT(0) and ECT(1)

Surely this would also be true for a DualQ middlebox with no other load? Then you'd only see a lower marking rate on ECT(0) packets, if you had already reached saturation.

Repeated losses preceded by few CEs (if any).

Conventional AQMs can already ramp up CE marking to high rates in many cases. I understand DCTCP's steady state is at just 2 marks per RTT, which is reached by Codel after just two intervals (if 'interval' is about the same as the path RTT), and generally it would be hard for a Reno-linear increase function to fill a big queue before Codel ramped up far enough to exert that level of control.

But at this level of marking, Classic ECN flows are not steady-state, but halving their cwnd every RTT. That's the core problem.

In summary, I have no confidence in the above measures from a theoretical standpoint. I was hoping to see a practical implementation and test results which showed them actually working.

oliviertilmans commented 5 years ago

Sorry for the late reply, I currently have to work on topics unrelated to L4S or transport as a whole. Quick thoughts:

On 03/26/2019 13:28, chromi wrote:

RTT increase proportional to cwnd increase

This sounds like it would detect a dumb FIFO with reasonable reliability, but TCP Prague already responds "correctly" to tail drops and is thus no less safe than other TCPs with this type of queue. Thus it is not necessary to specifically detect that type of bottleneck.

My (personal) opinion on this is the opposite. I think tracking the evolution of some timings (be it RTT, inter-packet gaps) is extremely important, not only to detect a dumb FIFO (or a slightly smarter one that would apply classic CEs), but also to improve TCP Prague's performance during slow-start (e.g., paced chirping) and minimize overshoot (which is imho even more important than 'safety' fallbacks). BBRv2 seems to be doing some flavor of this, but I haven't had any opportunity yet to look at more than what was said at ICCRG.

Roughly, I believe that the RTT probing phase of BBR(v2) is very close to what we need in combination with paced chirping.

In a DualQ environment, cwnd increase beyond BDP would presumably cause spilling into the Classic queue, which works the same as a standard AQM.

I am not sure I fully understood your point there. If the cwnd increase exceeds the BDP, then a queue is building up, i.e., the AQM applies marks. I do not see why it would spill to the classic queue, i.e., the dualQ coupling ensures any extra queue buildup in the classic queue translates to increased marking in the L4S one (in addition to marks set by the L4S AQM).

No significant RTT difference when toggling between ECT(0) and ECT(1)

Surely this would also be true for a DualQ middlebox with no other load? Then you'd only see a lower marking rate on ECT(0) packets, if you had already reached saturation.

Yes, and this lower marking for ect(0) would then build up a queue hence eventually increase the RTT/make inter-packet gaps invisible.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/L4STeam/tcp-prague/issues/2#issuecomment-476633880

chromi commented 5 years ago

Well, I think the proof of the pudding will be in the eating - and we can't eat it until it's been baked.

L4STeam / linux

RTT fairness vs. conventional ECT flows #2