Open marcbou opened 2 years ago
Could you share your setup? I did not observe significant packet reordering during my tests using iperf3
. What is the resource usage while Phantun is under load?
Could you share your setup? I did not observe significant packet reordering during my tests using
iperf3
. What is the resource usage while Phantun is under load?
Direct use ping through tunnel,u can see significant amounts of out-of-order packets
I did some debugging, and it is true that Phantun might deliver packet out of order on multi-core system due to it's multi-thread nature.
However, during my test, this generally does not cause performance issue for TCP applications because Phantun keeps the cores very balanced, so the out of order packets will still be delivered to the TCP stack quickly enough to not cause retransmissions.
There is no simple and easy way to mitigate this without disabling multi core support (or ensure packets from the same stream are always handled by a single core only). I might add a config option to allow users to specify how many threads Phantun should use (currently it does so automatically to be equal to the number of workers). If out of order delivery is really a big issue then decreasing the number of cores to use to 1
should mitigate it. However, this will obviously affect the performance of Phantun too so it is not something I want to enable by default at the moment.
there are definitely UDP-based protocols not reorder-resist. (e.g. in one of my previous companys, for better performance and keep the protocol simple at same time, they chose to assume reorder is rare)
ensure packets from the same stream are always handled by a single core only
This is a very practical solution. At least there can be an option to toggle on this behavior.
@wangyu-
ensure packets from the same stream are always handled by a single core only
This is possible, but for some protocols like WireGuard, there is no way to access info regarding the encapsulated flow's tuple, thus very difficult to make the correct decision.
for some protocols like WireGuard, there is no way to access info regarding the encapsulated flow's tuple, thus very difficult to make the correct decision.
Is wireguard different?
wireguard---(UDP)--->phantun-----(tun)-------------(tun)---->phantun---(UDP)--->wireguard
wireguard talks with phantun via UDP, in the same way other UDP programs does
@wangyu- It is hard to differentiate different flows within a WireGuard connection, as flow information is not exposed by WireGuard. From the outside, there is only a single UDP stream that contains all the data.
Yes, that's true. Sorry, I miss understood the access info regarding the encapsulated flow
you mentioned.
I didn't suggest handling each stream inside wireguard with a single core. I mean handle each wireguard stream by a single core. For example, if you have 2 wg clients connecting to the same server via phantun, then it's 2 streams.
It does decrease performance for wireguard's case.
But it solves the out-of-order problem like ss(udp) over phantun or kcptun over phantun without compromise much performance. For ss, each inner udp connection exposes an outer udp connection. For kcptun, it supports M:N multiplex, it exposes N streams to phantun.
At least there can be an option to toggle on this behavior.
an option to toggle this
will be good, users can get both order and performance for programs like ss/kcptun. And for wireguard, users can choose between order and performance.
access info regarding the encapsulated flow's tuple
This might be a bit ambiguous, there are 2 possible ways to interpret this:
Yes, for multiple WireGuard tunnels, CPU core affinity can indeed solve the reordering issue. However, it is difficult to enable this kind of behavior by default primarily because single flow CPU limitation is still a major concern. The easiest way seems to be providing an option to disable multi core processing in Phantun and rely on the user to start multiple instances under single thread mode, but the need has not be super strong and I am not actively working on this.
when running iperf3 -u (-b100m) over wireguard tunnel over phantun, I am seeing significant amounts of out-of-order packets which is problematic as it can significantly degrade performance of certain protocols notably TCP and even UDP-based applications.
any parallel processing/queuing of packets should be done so as to avoid reordering within flows.
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35247.pdf https://www.linuxzen.com/notes/notes/20220416073757-multi_queue_nics/