dndx / phantun

Transforms UDP stream into (fake) TCP streams that can go through Layer 3 & Layer 4 (NAPT) firewalls/NATs.
Apache License 2.0
1.71k stars 138 forks source link

undesirable packet reordering #77

Open marcbou opened 2 years ago

marcbou commented 2 years ago

when running iperf3 -u (-b100m) over wireguard tunnel over phantun, I am seeing significant amounts of out-of-order packets which is problematic as it can significantly degrade performance of certain protocols notably TCP and even UDP-based applications.

any parallel processing/queuing of packets should be done so as to avoid reordering within flows.

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35247.pdf https://www.linuxzen.com/notes/notes/20220416073757-multi_queue_nics/

dndx commented 2 years ago

Could you share your setup? I did not observe significant packet reordering during my tests using iperf3. What is the resource usage while Phantun is under load?

Handsome1080P commented 2 years ago

Could you share your setup? I did not observe significant packet reordering during my tests using iperf3. What is the resource usage while Phantun is under load?

Direct use ping through tunnel,u can see significant amounts of out-of-order packets

dndx commented 2 years ago

I did some debugging, and it is true that Phantun might deliver packet out of order on multi-core system due to it's multi-thread nature.

However, during my test, this generally does not cause performance issue for TCP applications because Phantun keeps the cores very balanced, so the out of order packets will still be delivered to the TCP stack quickly enough to not cause retransmissions.

There is no simple and easy way to mitigate this without disabling multi core support (or ensure packets from the same stream are always handled by a single core only). I might add a config option to allow users to specify how many threads Phantun should use (currently it does so automatically to be equal to the number of workers). If out of order delivery is really a big issue then decreasing the number of cores to use to 1 should mitigate it. However, this will obviously affect the performance of Phantun too so it is not something I want to enable by default at the moment.

wangyu- commented 1 year ago

there are definitely UDP-based protocols not reorder-resist. (e.g. in one of my previous companys, for better performance and keep the protocol simple at same time, they chose to assume reorder is rare)

ensure packets from the same stream are always handled by a single core only

This is a very practical solution. At least there can be an option to toggle on this behavior.

dndx commented 1 year ago

@wangyu-

ensure packets from the same stream are always handled by a single core only

This is possible, but for some protocols like WireGuard, there is no way to access info regarding the encapsulated flow's tuple, thus very difficult to make the correct decision.

wangyu- commented 1 year ago

for some protocols like WireGuard, there is no way to access info regarding the encapsulated flow's tuple, thus very difficult to make the correct decision.

Is wireguard different?

wireguard---(UDP)--->phantun-----(tun)-------------(tun)---->phantun---(UDP)--->wireguard

wireguard talks with phantun via UDP, in the same way other UDP programs does

dndx commented 1 year ago

@wangyu- It is hard to differentiate different flows within a WireGuard connection, as flow information is not exposed by WireGuard. From the outside, there is only a single UDP stream that contains all the data.

wangyu- commented 1 year ago

Yes, that's true.  Sorry, I miss understood the access info regarding the encapsulated flow you mentioned.

I didn't suggest handling each stream inside wireguard with a single core. I mean handle each wireguard stream by a single core. For example, if you have 2 wg clients connecting to the same server via phantun, then it's 2 streams.  

It does decrease performance for wireguard's case.  

But it solves the out-of-order problem like ss(udp) over phantun or kcptun over phantun without compromise much performance.  For ss, each inner udp connection exposes an outer udp connection.   For kcptun, it supports M:N multiplex, it exposes N streams to phantun.

At least there can be an option to toggle on this behavior.

an option to toggle this will be good, users can get both order and performance for programs like ss/kcptun. And for wireguard, users can choose between order and performance.

wangyu- commented 1 year ago

access info regarding the encapsulated flow's tuple

This might be a bit ambiguous, there are 2 possible ways to interpret this:

  1. the user's flow encapsulated inside wireguard  (you actually mean this)
  2. the wireguard encapsulated inside phantun  (I thought you mean this )
dndx commented 1 year ago

Yes, for multiple WireGuard tunnels, CPU core affinity can indeed solve the reordering issue. However, it is difficult to enable this kind of behavior by default primarily because single flow CPU limitation is still a major concern. The easiest way seems to be providing an option to disable multi core processing in Phantun and rely on the user to start multiple instances under single thread mode, but the need has not be super strong and I am not actively working on this.