celzero / firestack

Userspace wireguard and network monitor
https://rethinkdns.com/app
Mozilla Public License 2.0
87 stars 15 forks source link

Very high TCP retransmissions (not a bug, but interesting) #58

Closed Lanius-collaris closed 4 months ago

Lanius-collaris commented 4 months ago

It is not a bug.

rethink firewall mode, no wireguard

initForwarders = 3, maxForwarders = 9 :

emulator64_x86_64:/data/local/tmp $ ./iperf3.17.1 -c 10.64.0.5                 
Connecting to host 10.64.0.5, port 5201
[  5] local 10.111.222.1 port 34498 connected to 10.64.0.5 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  28.2 MBytes   237 Mbits/sec  756   2.83 KBytes       
[  5]   1.00-2.00   sec  41.8 MBytes   350 Mbits/sec  1138   2.83 KBytes       
[  5]   2.00-3.00   sec  43.0 MBytes   360 Mbits/sec  1456   2.83 KBytes       
[  5]   3.00-4.00   sec  41.8 MBytes   351 Mbits/sec  1182   2.83 KBytes       
[  5]   4.00-5.00   sec  41.0 MBytes   344 Mbits/sec  1040   2.83 KBytes       
[  5]   5.00-6.00   sec  42.4 MBytes   356 Mbits/sec  1245   2.83 KBytes       
[  5]   6.00-7.00   sec  42.5 MBytes   356 Mbits/sec  1403   2.83 KBytes       
[  5]   7.00-8.00   sec  43.8 MBytes   367 Mbits/sec  1536   2.83 KBytes       
[  5]   8.00-9.00   sec  43.8 MBytes   367 Mbits/sec  1599   4.24 KBytes       
[  5]   9.00-10.00  sec  42.4 MBytes   356 Mbits/sec  1479   5.66 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   410 MBytes   344 Mbits/sec  12834             sender
[  5]   0.00-10.00  sec   409 MBytes   343 Mbits/sec                  receiver

iperf Done.

Note: iperf3 gets total retransmissions from tcp_info struct https://github.com/esnet/iperf/blob/master/src/tcp_info.c#L114-#L126 On linux, you can find definition of tcp_info in /usr/include/linux/tcp.h

It's interesting that setting a smaller maxForwarders is able to reduce TCP retransmissions (but increases RTT, see Cwnd). Although I'm not sure if it's prone to "deadlock".

maxForwarders = 2:

emulator64_x86_64:/data/local/tmp $ ./iperf3.17.1 -c 10.64.0.5
Connecting to host 10.64.0.5, port 5201
[  5] local 10.111.222.1 port 43058 connected to 10.64.0.5 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   140 MBytes  1.17 Gbits/sec  385    180 KBytes       
[  5]   1.00-2.00   sec   145 MBytes  1.21 Gbits/sec  305    411 KBytes       
[  5]   2.00-3.00   sec   141 MBytes  1.18 Gbits/sec  585   42.4 KBytes       
[  5]   3.00-4.00   sec   144 MBytes  1.20 Gbits/sec  270   90.5 KBytes       
[  5]   4.00-5.00   sec   142 MBytes  1.19 Gbits/sec  279   50.9 KBytes       
[  5]   5.00-6.00   sec   140 MBytes  1.18 Gbits/sec  323   29.7 KBytes       
[  5]   6.00-7.00   sec   145 MBytes  1.22 Gbits/sec  407   58.0 KBytes       
[  5]   7.00-8.00   sec   143 MBytes  1.20 Gbits/sec  291   38.2 KBytes       
[  5]   8.00-9.00   sec   142 MBytes  1.19 Gbits/sec  452    182 KBytes       
[  5]   9.00-10.00  sec   144 MBytes  1.21 Gbits/sec  260    324 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.39 GBytes  1.20 Gbits/sec  3557             sender
[  5]   0.00-10.01  sec  1.39 GBytes  1.19 Gbits/sec                  receiver

iperf Done.

maxForwarders = 1

emulator64_x86_64:/data/local/tmp $ ./iperf3.17.1 -c 10.64.0.5
Connecting to host 10.64.0.5, port 5201
[  5] local 10.111.222.1 port 39976 connected to 10.64.0.5 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   161 MBytes  1.35 Gbits/sec   37    489 KBytes       
[  5]   1.00-2.00   sec   161 MBytes  1.35 Gbits/sec    8    492 KBytes       
[  5]   2.00-3.00   sec   160 MBytes  1.34 Gbits/sec    0    696 KBytes       
[  5]   3.00-4.00   sec   160 MBytes  1.34 Gbits/sec    7    597 KBytes       
[  5]   4.00-5.00   sec   160 MBytes  1.34 Gbits/sec    3    594 KBytes       
[  5]   5.00-6.00   sec   157 MBytes  1.32 Gbits/sec    4    587 KBytes       
[  5]   6.00-7.00   sec   155 MBytes  1.30 Gbits/sec   10    576 KBytes       
[  5]   7.00-8.00   sec   162 MBytes  1.36 Gbits/sec    3    571 KBytes       
[  5]   8.00-9.00   sec   162 MBytes  1.36 Gbits/sec   14    561 KBytes       
[  5]   9.00-10.00  sec   160 MBytes  1.35 Gbits/sec    2    557 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.56 GBytes  1.34 Gbits/sec   88             sender
[  5]   0.00-10.00  sec  1.56 GBytes  1.34 Gbits/sec                  receiver

iperf Done.

Note: You can see Retr field = 0 with some apps (e.g. Intra). I guess they ACK all segments. lol

ignoramous commented 4 months ago

(thank you)

Might as well be a bug (though, I doubt it triggers at all for non-local traffic?).

The moral of this story is, we don't need multiple goroutines acting on one channel (like today), but need multiple channels (carefully muxing/demuxing TCP streams & ICMP/UDP packets onto them) with each pinned exactly to just one goroutine consumer (like done by Google: https://github.com/google/gvisor/commit/19c7ca8c3bd9dd8bfe8c657845d79c752c9f3ff6).

We might want to prioritise backporting their processors.go logic, if retrans are higher with regular traffic too (ie, internet bound TCP streams).

Although I'm not sure if it's prone to "deadlock".

netstack "deadlock"s can happen with any number of forwarders, but higher their count, the less likely netstack "deadlock"s (actually, it stalls).

maxForwarders can be 0. If there's just one forwarder, that is, if initForwarders=1 and maxForwarders=0 (both can also be set to 1), this stall happens occasionally (but it is not rare). With initForwarders=3, I never hit a stall. maxFowarders=9 is more of an insurance than guarantee.

Lanius-collaris commented 4 months ago

I think the "buffer" is too shallow to make congestion control work. When forwarding TCP, inner TCP connections and outer TCP connections are piped.

I doubt it triggers at all for non-local traffic?

No matter how slow the outer link will be, the inner link will be fast forever.

ignoramous commented 4 months ago

I think the "buffer" is too shallow to make congestion control work.

Sorry, shallow buffer set where? In netstack.go?

https://github.com/celzero/firestack/blob/2d2c9b8ef06b5a85b677100536ea87c1188d0022/intra/netstack/netstack.go#L114-L122

Lanius-collaris commented 4 months ago

Sorry, shallow buffer set where? In netstack.go?

I don't know, there isn't a real router with queue between two endpoints of the inner TCP connection. Maybe forwarders act like shallow buffer. 🤕

ignoramous commented 4 months ago

I don't know, there isn't a real router with queue between two endpoints of the inner TCP connection.

Oh okay. Yeah, there's no real buffer there, but a splice/copy from one end of the pipe to the other: https://github.com/celzero/firestack/blob/7306ed7024fc8e8d9847f81cea42b827f0a5877a/intra/common.go#L25-L42

Maybe forwarders act like shallow buffer

I've removed the channel and forwarders acting on it, and brought in Google's changes (from upstream), which deals in a list-of-packet-buffers pre-sharded using 5-tuple (src,dst,proto) to any one among the preset 9 processors.

Lanius-collaris commented 4 months ago

Fixed? 🤗

emulator64_x86_64:/data/local/tmp $ ./iperf3.17.1 -c 10.64.0.5
Connecting to host 10.64.0.5, port 5201
[  5] local 10.111.222.1 port 47092 connected to 10.64.0.5 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   169 MBytes  1.42 Gbits/sec   13    515 KBytes       
[  5]   1.00-2.00   sec   172 MBytes  1.44 Gbits/sec    2    522 KBytes       
[  5]   2.00-3.00   sec   170 MBytes  1.43 Gbits/sec    2    530 KBytes       
[  5]   3.00-4.00   sec   172 MBytes  1.45 Gbits/sec    1    526 KBytes       
[  5]   4.00-5.00   sec   168 MBytes  1.41 Gbits/sec    3    510 KBytes       
[  5]   5.00-6.00   sec   173 MBytes  1.45 Gbits/sec    1    526 KBytes       
[  5]   6.00-7.00   sec   172 MBytes  1.44 Gbits/sec    1    536 KBytes       
[  5]   7.00-8.00   sec   170 MBytes  1.43 Gbits/sec    5    542 KBytes       
[  5]   8.00-9.00   sec   174 MBytes  1.46 Gbits/sec    5    547 KBytes       
[  5]   9.00-10.00  sec   174 MBytes  1.46 Gbits/sec    6    561 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.68 GBytes  1.44 Gbits/sec   39             sender
[  5]   0.00-10.00  sec  1.67 GBytes  1.44 Gbits/sec                  receiver

iperf Done.
ignoramous commented 4 months ago

Thanks for confirming! (:

A happy coincidence that Google pushed out changes to the exact same part of the code to improve perf... that we ended up getting those fixes for free.