1WAN is faster than 2WANS aggregation? Why?

Ysurac / openmptcprouter

OpenMPTCProuter is an open source solution to aggregate multiple internet connections using Multipath TCP (MPTCP) on OpenWrt

https://www.openmptcprouter.com/

GNU General Public License v3.0

1.8k stars 259 forks source link

1WAN is faster than 2WANS aggregation? Why? #2365

Closed hle5128 closed 1 year ago

hle5128 commented 2 years ago

Expected Behavior

1) 2 WANS (master/enable) aggregation should be more speed than just 1 single WAN (master) when aggregation

Current Behavior

1) WAN A alone omrtestspeed eth1:

root@OpenMPTCProuter:~# omr-test-speed eth1
Select best test server...
host: scaleway.testdebit.info - ping: 119
host: bordeaux.testdebit.info - ping: 151
host: aix-marseille.testdebit.info - ping: 142
host: lyon.testdebit.info - ping: 159
host: lille.testdebit.info - ping: 140
host: paris.testdebit.info - ping: 139
host: appliwave.testdebit.info - ping: 157
host: speedtest.frankfurt.linode.com - ping: 152
host: speedtest.tokyo2.linode.com - ping: 225
host: speedtest.singapore.linode.com - ping: 293
host: speedtest.newark.linode.com - ping: 81
host: speedtest.atlanta.linode.com - ping: 71
host: speedtest.dallas.linode.com - ping: 71
host: speedtest.fremont.linode.com - ping: 112
host: speed.hetzner.de - ping: 147
host: ipv4.bouygues.testdebit.info - ping: 134
host: par.download.datapacket.com - ping: 144
host: nyc.download.datapacket.com - ping: 67
host: ams.download.datapacket.com - ping: 151
host: fra.download.datapacket.com - ping: 141
host: lon.download.datapacket.com - ping: 149
host: mad.download.datapacket.com - ping: 167
host: prg.download.datapacket.com - ping: 203
host: sto.download.datapacket.com - ping: 162
host: vie.download.datapacket.com - ping: 165
host: war.download.datapacket.com - ping: 173
host: atl.download.datapacket.com - ping: 72
host: chi.download.datapacket.com - ping: 70
host: lax.download.datapacket.com - ping: 95
host: mia.download.datapacket.com - ping: 54
host: nyc.download.datapacket.com - ping: 65
host: speedtest.milkywan.fr - ping: 165
Best server is http://mia.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 40 9536M   40 3825M    0     0  20.1M      0  0:07:52  0:03:09  0:04:43 17.8M

2) WAN B alone omrtestspeed eth2:


root@OpenMPTCProuter:~# omr-test-speed eth2
Select best test server...
host: scaleway.testdebit.info - ping: 153
host: bordeaux.testdebit.info - ping: 156
host: aix-marseille.testdebit.info - ping: 147
host: lyon.testdebit.info - ping: 153
host: lille.testdebit.info - ping: 145
host: paris.testdebit.info - ping: 152
host: appliwave.testdebit.info - ping: 154
host: speedtest.frankfurt.linode.com - ping: 158
host: speedtest.tokyo2.linode.com - ping: 233
host: speedtest.singapore.linode.com - ping: 304
host: speedtest.newark.linode.com - ping: 82
host: speedtest.atlanta.linode.com - ping: 91
host: speedtest.dallas.linode.com - ping: 86
host: speedtest.fremont.linode.com - ping: 133
host: speed.hetzner.de - ping: 155
host: ipv4.bouygues.testdebit.info - ping: 158
host: par.download.datapacket.com - ping: 167
host: nyc.download.datapacket.com - ping: 94
host: ams.download.datapacket.com - ping: 179
host: fra.download.datapacket.com - ping: 152
host: lon.download.datapacket.com - ping: 139
host: mad.download.datapacket.com - ping: 190
host: prg.download.datapacket.com - ping: 191
host: sto.download.datapacket.com - ping: 199
host: vie.download.datapacket.com - ping: 163
host: war.download.datapacket.com - ping: 186
host: atl.download.datapacket.com - ping: 79
host: chi.download.datapacket.com - ping: 113
host: lax.download.datapacket.com - ping: 101
host: mia.download.datapacket.com - ping: 67
host: nyc.download.datapacket.com - ping: 80
host: speedtest.milkywan.fr - ping: 143
Best server is http://mia.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  2 9536M    2  216M    0     0  16.2M      0  0:09:46  0:00:13  0:09:33 17.8M

3) WAN A +B omr-test-speed

root@OpenMPTCProuter:~# omr-test-speed
Select best test server...
host: scaleway.testdebit.info - ping: 147
host: bordeaux.testdebit.info - ping: 174
host: aix-marseille.testdebit.info - ping: 166
host: lyon.testdebit.info - ping: 174
host: lille.testdebit.info - ping: 151
host: paris.testdebit.info - ping: 150
host: appliwave.testdebit.info - ping: 160
host: speedtest.frankfurt.linode.com - ping: 161
host: speedtest.tokyo2.linode.com - ping: 251
host: speedtest.singapore.linode.com - ping: 358
host: speedtest.newark.linode.com - ping: 76
host: speedtest.atlanta.linode.com - ping: 94
host: speedtest.dallas.linode.com - ping: 113
host: speedtest.fremont.linode.com - ping: 155
host: speed.hetzner.de - ping: 157
host: ipv4.bouygues.testdebit.info - ping: 145
host: par.download.datapacket.com - ping: 142
host: nyc.download.datapacket.com - ping: 85
host: ams.download.datapacket.com - ping: 153
host: fra.download.datapacket.com - ping: 154
host: lon.download.datapacket.com - ping: 152
host: mad.download.datapacket.com - ping: 160
host: prg.download.datapacket.com - ping: 164
host: sto.download.datapacket.com - ping: 210
host: vie.download.datapacket.com - ping: 189
host: war.download.datapacket.com - ping: 172
host: atl.download.datapacket.com - ping: 96
host: chi.download.datapacket.com - ping: 104
host: lax.download.datapacket.com - ping: 153
host: mia.download.datapacket.com - ping: 108
host: nyc.download.datapacket.com - ping: 111
host: speedtest.milkywan.fr - ping: 150
Best server is http://speedtest.newark.linode.com/garbage.php?ckSize=10000, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  267M    0  267M    0     0  8966k      0 --:--:--  0:00:30 --:--:-- 7603k

4) WAN A Master, WAN B backup

root@OpenMPTCProuter:~# omr-test-speed
Select best test server...
host: scaleway.testdebit.info - ping: 152
host: bordeaux.testdebit.info - ping: 172
host: aix-marseille.testdebit.info - ping: 163
host: lyon.testdebit.info - ping: 176
host: lille.testdebit.info - ping: 146
host: paris.testdebit.info - ping: 150
host: appliwave.testdebit.info - ping: 172
host: speedtest.frankfurt.linode.com - ping: 153
host: speedtest.tokyo2.linode.com - ping: 242
host: speedtest.singapore.linode.com - ping: 358
host: speedtest.newark.linode.com - ping: 74
host: speedtest.atlanta.linode.com - ping: 89
host: speedtest.dallas.linode.com - ping: 116
host: speedtest.fremont.linode.com - ping: 154
host: speed.hetzner.de - ping: 158
host: ipv4.bouygues.testdebit.info - ping: 147
host: par.download.datapacket.com - ping: 147
host: nyc.download.datapacket.com - ping: 76
host: ams.download.datapacket.com - ping: 149
host: fra.download.datapacket.com - ping: 154
host: lon.download.datapacket.com - ping: 153
host: mad.download.datapacket.com - ping: 156
host: prg.download.datapacket.com - ping: 174
host: sto.download.datapacket.com - ping: 171
host: vie.download.datapacket.com - ping: 181
host: war.download.datapacket.com - ping: 170
host: atl.download.datapacket.com - ping: 98
host: chi.download.datapacket.com - ping: 105
host: lax.download.datapacket.com - ping: 145
host: mia.download.datapacket.com - ping: 102
host: nyc.download.datapacket.com - ping: 75
host: speedtest.milkywan.fr - ping: 150
Best server is http://speedtest.newark.linode.com/garbage.php?ckSize=10000, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  552M    0  552M    0     0  15.0M      0 --:--:--  0:00:36 --:--:-- 16.0M

Specifications

OpenMPTCProuter version: openmptcprouter v0.59beta6-5.4 r0+16591-b4ea8e1089
OpenMPTCProuter VPS version: OpenMPTCProuter VPS 0.1027-test / 5.4kernel
OpenMPTCProuter VPS provider: buyVM in NewYork
OpenMPTCProuter platform: RPI4B 4G, overclock on demand 600Mhz-2.1Ghz / R4S RK3399 (same result)
2 WAN on 5G
Encryption Off, on V2Ray,
Tried different method, CCA, but the same result, currently result above is bbr/blest/full-mesh, Multipath TCP SYN retries: 5
Load under all tests is not above 50% CPU

Sys log nothing ordinary:

https://pastebin.com/1PwTDUPY

Why WAN A aggregation just alone is faster than WAN A + WANB? Shouldnt it be at least near speed of 16Mbps, but in this case, its its just 76Mbps If due to VPS or Hardware, then the above testing should not be true. Shouldn't it make sense to just run WAN A > Wireguard Server or to achieve faster speed?

Ysurac commented 2 years ago

Run test at least 2 minutes. Your WAN are on 2 different providers ? What is the latency to the VPS for each WAN ? What is the result with default configuration (shadowsocks and chacha20 encryption) ? All is green in status page ? I can see in the log that it's not so stable. RPI4 shouldn't be set as overclock on demand, this doesn't really work well.

hle5128 commented 2 years ago

Hi, Thank you for your reply, just bought you a cup of coffe. omr-iperf3 doesn't work in beta6 so I'm using ping.

To answer your questions: 1) On Tmobile, but on the different towers and different bands, testing 2 WANS individually same time doesn't slow down.

root@OpenMPTCProuter:~# ping -I eth1 198.98.62.146
PING 198.98.62.146 (198.98.62.146) from 192.168.4.66 eth1: 56(84) bytes of data.
64 bytes from 198.98.62.146: icmp_seq=1 ttl=65 time=82.3 ms
64 bytes from 198.98.62.146: icmp_seq=2 ttl=65 time=87.5 ms
64 bytes from 198.98.62.146: icmp_seq=3 ttl=65 time=80.9 ms
64 bytes from 198.98.62.146: icmp_seq=4 ttl=65 time=77.5 ms
64 bytes from 198.98.62.146: icmp_seq=5 ttl=65 time=78.1 ms
64 bytes from 198.98.62.146: icmp_seq=6 ttl=65 time=79.4 ms
64 bytes from 198.98.62.146: icmp_seq=7 ttl=65 time=77.6 ms
64 bytes from 198.98.62.146: icmp_seq=8 ttl=65 time=76.8 ms
64 bytes from 198.98.62.146: icmp_seq=9 ttl=65 time=81.3 ms
^C
--- 198.98.62.146 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8012ms
rtt min/avg/max/mdev = 76.765/80.140/87.492/3.161 ms

root@OpenMPTCProuter:~# ping -I eth2 198.98.62.146
PING 198.98.62.146 (198.98.62.146) from 192.168.5.31 eth2: 56(84) bytes of data.
64 bytes from 198.98.62.146: icmp_seq=1 ttl=65 time=76.0 ms
64 bytes from 198.98.62.146: icmp_seq=2 ttl=65 time=126 ms
64 bytes from 198.98.62.146: icmp_seq=3 ttl=65 time=95.6 ms
64 bytes from 198.98.62.146: icmp_seq=4 ttl=65 time=88.5 ms
64 bytes from 198.98.62.146: icmp_seq=5 ttl=65 time=117 ms
64 bytes from 198.98.62.146: icmp_seq=6 ttl=65 time=131 ms
64 bytes from 198.98.62.146: icmp_seq=7 ttl=65 time=88.2 ms
64 bytes from 198.98.62.146: icmp_seq=8 ttl=65 time=98.6 ms
64 bytes from 198.98.62.146: icmp_seq=9 ttl=65 time=136 ms
^C
--- 198.98.62.146 ping statistics ---
9 packets transmitted, 9 received, 0% packet loss, time 8011ms
rtt min/avg/max/mdev = 76.003/106.337/135.595/20.339 ms
root@OpenMPTCProuter:~#

3) I tried different ways before but here is the result under shadowsocks and chacha20, testing min 2mins

root@OpenMPTCProuter:~# omr-test-speed eth1

host: lax.download.datapacket.com - ping: 92
host: mia.download.datapacket.com - ping: 48
host: nyc.download.datapacket.com - ping: 68
host: speedtest.milkywan.fr - ping: 153
Best server is http://mia.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 29 9536M   29 2772M    0     0  21.3M      0  0:07:27  0:02:09  0:05:18 21.4M

root@OpenMPTCProuter:~# omr-test-speed eth2

host: war.download.datapacket.com - ping: 210
host: atl.download.datapacket.com - ping: 74
host: chi.download.datapacket.com - ping: 81
host: lax.download.datapacket.com - ping: 123
host: mia.download.datapacket.com - ping: 62
host: nyc.download.datapacket.com - ping: 115
host: speedtest.milkywan.fr - ping: 158
Best server is http://mia.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 25 9536M   25 2463M    0     0  16.9M      0  0:09:21  0:02:25  0:06:56 22.9M

root@OpenMPTCProuter:~# omr-test-speed average is worst than individual WAN

host: atl.download.datapacket.com - ping: 113
host: chi.download.datapacket.com - ping: 111
host: lax.download.datapacket.com - ping: 148
host: mia.download.datapacket.com - ping: 112
host: nyc.download.datapacket.com - ping: 68
host: speedtest.milkywan.fr - ping: 153
Best server is http://nyc.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 16 9536M   16 1612M    0     0  12.6M      0  0:12:32  0:02:07  0:10:25 11.3M

4) All green on status page

Let me know if you need to do anything.

Ysurac commented 2 years ago

I see eth1 and eth2, you are using some USB adapters ? It's often not a good idea on RPI. What is the result of Network->MPTCP "MPTCP support check" tab for each wan ?

hle5128 commented 2 years ago

I see eth1 and eth2, you are using some USB adapters? It's often not a good idea on RPI. What is the result of Network->MPTCP "MPTCP support check" tab for each wan ?

Yes, USB LAN 1Gb, the driver is recognized by OpenWrt default and USB is realtek and axis, I have tried VLAN switch before as well. Im using 4A power adapter by the way, not 3A, and some cooling thermal (because of overclocking) CPU temp stays under 50C under full load stress test, but regardless result is the same anyway without overclock or underclock, USB or Switch.

MPTCP check eth1

1: 198.98.62.146 TCP::SrcPort TCP::DstPort TCP::SeqNumber TCP::AckNumber TCP::DataOffset TCP::Flags TCP::WindowsSize TCP::CheckSum IP::DiffServicesCP IP::TotalLength IP::Identification IP::TTL IP::CheckSum IP::SourceIP IP::DestinationIP +TCPOptionMaxSegSize TCPOptionMPTCPCapable::Sender's Key 

Result:
< Ethernet (14 bytes) :: DestinationMAC = b4:b0:24:86:c8:e8 , SourceMAC = 80:cc:9c:bf:dd:49 , Type = 0x800 , >
< IP (20 bytes) :: Version = 4 , HeaderLength = 5 , DiffServicesCP = 10 , ExpCongestionNot = 0 , TotalLength = 56 , Identification = 0x0 , Flags = 2 , FragmentOffset = 0 , TTL = 65 , Protocol = 0x6 , CheckSum = 0x6fb9 , SourceIP = 198.98.62.146 , DestinationIP = 192.168.4.66 , >
< TCP (20 bytes) :: SrcPort = 65101 , DstPort = 43418 , SeqNumber = 91094725 , AckNumber = 1725035436 , DataOffset = 9 , Reserved = 0 , Flags = ( SYN ACK ) , WindowsSize = 43200 , CheckSum = 0x7321 , UrgPointer = 0 , >
< TCPOptionMaxSegSize (4 bytes) :: Kind = 2 , Length = 4 , MaxSegSize = 1400 , >
< TCPOptionMPTCPCapable (12 bytes) :: Kind = 30 , Length = 12 , Subtype = 0 , Version = 0 , Checksum = 1 (Checksum Enabled) , Flags = 0 , Crypto = 1 (HMAC-SHA1) , Sender's Key = Sender's Key = 14815253685594599902 , >

MPTCP check eth2

1: 192.168.5.1 IP::DiffServicesCP IP::CheckSum 
2: 198.98.62.146 TCP::SrcPort TCP::DstPort TCP::SeqNumber TCP::AckNumber TCP::DataOffset TCP::Flags TCP::WindowsSize TCP::CheckSum IP::DiffServicesCP IP::TotalLength IP::Identification IP::TTL IP::CheckSum IP::SourceIP IP::DestinationIP +TCPOptionMaxSegSize TCPOptionMPTCPCapable::Sender's Key 

Result:
< Ethernet (14 bytes) :: DestinationMAC = 7c:c2:c6:42:90:79 , SourceMAC = 80:cc:9c:bf:96:90 , Type = 0x800 , >
< IP (20 bytes) :: Version = 4 , HeaderLength = 5 , DiffServicesCP = 10 , ExpCongestionNot = 0 , TotalLength = 56 , Identification = 0x0 , Flags = 2 , FragmentOffset = 0 , TTL = 65 , Protocol = 0x6 , CheckSum = 0x6edc , SourceIP = 198.98.62.146 , DestinationIP = 192.168.5.31 , >
< TCP (20 bytes) :: SrcPort = 65101 , DstPort = 17592 , SeqNumber = 1696924198 , AckNumber = 494052008 , DataOffset = 9 , Reserved = 0 , Flags = ( SYN ACK ) , WindowsSize = 43200 , CheckSum = 0x99d4 , UrgPointer = 0 , >
< TCPOptionMaxSegSize (4 bytes) :: Kind = 2 , Length = 4 , MaxSegSize = 1400 , >
< TCPOptionMPTCPCapable (12 bytes) :: Kind = 30 , Length = 12 , Subtype = 0 , Version = 0 , Checksum = 1 (Checksum Enabled) , Flags = 0 , Crypto = 1 (HMAC-SHA1) , Sender's Key = Sender's Key = 9216172586763779890 , >

USB LAN information:

[   11.744956] r8152 2-2:1.0 eth1: v1.10.11
[   15.673664] r8152 2-2:1.0 eth1: carrier on

[   12.244586] ax88179_178a 2-1:1.0 eth2: register 'ax88179_178a' at usb-0000:01:00.0-1, ASIX AX88179 USB 3.0 Gigabit Ethernet, 7c:c2:c6:42:90:79
[   20.730115] ax88179_178a 2-1:1.0 eth2: ax88179 - Link status is: 1
[  313.684678] ax88179_178a 2-1:1.0 eth2: ax88179 - Link status is: 1
[  316.477107] ax88179_178a 2-1:1.0 eth2: ax88179 - Link status is: 1
[  316.490838] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

Ysurac commented 2 years ago

@hle5128 I see a 308ms latency on wan2 from status page, latency is stable or not ?

hle5128 commented 2 years ago

I've had improvements with 2 symmetrical WANs by adding additional latency in one of the WAN interfaces section, which used around 20ms.

can you elaborate? so you are adding 20ms on one wan and hence improving aggregation speed overall? but my latency on each LAN is different though.

hle5128 commented 2 years ago

@hle5128 I see a 308ms latency on wan2 from the status page, latency is stable or not?

the WAN2 is connected to a different tower at a different angle and that tower is maxed speed of 200Mbps vs another one that is 500Mbps. So WAN2 is a little slower than WAN1. But aggregation between each WAN, 1 faster vs slower, shouldn't be the slower drag the faster WAN down and poor performance of overall right?

Ysurac commented 2 years ago

@hle5128 high latency make aggregation often slower, but I was asking if latency is stable or not ?

hle5128 commented 2 years ago

@hle5128 I see a 308ms latency on wan2 from status page, latency is stable or not ?

@Ysurac

here is the 30s ping on WAN2:

--- 198.98.62.146 ping statistics ---
32 packets transmitted, 27 received, 15.625% packet loss, time 31090ms
rtt min/avg/max/mdev = 107.750/169.363/200.242/20.825 ms
root@OpenMPTCProuter:~# 

--- 198.98.62.146 ping statistics ---
30 packets transmitted, 30 received, 0% packet loss, time 29010ms
rtt min/avg/max/mdev = 77.763/110.857/196.500/30.814 ms
root@OpenMPTCProuter:~#

Network-Traditions commented 2 years ago

In the event it's helpful, we've been testing v0.59 with the following configuration: OpenMPTCProuter: v0.59beta6-5.4 OpenMPTCProuter VPS version: 0.1026 Virtualization: vmware Operating System: Debian GNU/Linux 10 (buster) Kernel: Linux 5.4.100-mptcp Architecture: x86-64 OpenMPTCProuter VPS provider: IONOS.com 2 vCore 2GB RAM "Type M VPS" OpenMPTCProuter platform: x86_64 (HUNSN FNR-RS34G https://www.hunsn.com/item/network-security-firewall/pfsense-mini-pc) Telit FN980 5G Modem (HWv1.0 https://www.telit.com/devices/fn980-and-fn980m-data-cards-support-5g/) USB3.0 M.2 Key B Modem Adapter Enclosure (https://thewirelesshaven.com/shop/mini-pcie-m2-adapters/modem-enclosure/usb3-0-to-ngff-m-2-key-b-4g-5g-modem-adapter-enclosure-with-sim-card-slot-new-style/)

ISPs: 1) T-Mobile 5G Business Internet with static IP (Currently weak LTE only signal) 2) StarLink version 2 configured as Ethernet bridge connected to I225V3 eth0 (200Mbps Down/40Mbps Up MAX) 3) Cox Business Internet DOCSIS 3.1 Modem 1GB Ethernet to I225V3 eth1 (100Mbps Down/20Mpbs Up)

When aggregating any combination of the ISP service connections, our results are always less than the current download of the best performing connection at the time. Our objective is quality and latency so we are not as focused on aggregated bandwidth. When testing the T-Mobile connection with a high quality signal near the tower, +500Mbps Down and +50Mbps Up were typical. While unloaded latency through our VPS to either speedtest.net or fast.com average about +100ms for both T-Mobile and Starlink and about +50ms for Cox, using the settings of fast.com and the OpenMPTCProuter "status" page to monitor loaded latencies, these tests indicate a loaded latency modulating between the unloaded latency and 1000ms for T-Mobile and 500ms for Starlink, which is most often unbalanced (they don't follow each other). When the Cox cable service is added as the master, it seems to dominate the connection and holds loaded latency to a stable +50ms, but the testing bandwidth still remains less than the maximum performing connection at the time.

Our VPS static IP address ping seems to indicate a point of presence in the center of the U.S. The T-Mobile and StarLink services typically originate from the west coast while Cox is between the two. We chose our VPS due to unlimited data since all our ISP services are unlimited as well and we experience a significant amount of 24/7 traffic the exceeds most VPS vendor budget offerings. We can understand and intuitively believe this diversity between the services makes it difficult for the multipath TCP manager, scheduler and congestion control to achieve a high efficiency aggregation of the bandwidth.

Regarding multipath TCP manager, scheduler and congestion control options, our testing for our configuration with a focus on the T-Mobile and StarLink highly variable ISP services has demonstrated the best results of both bandwidth, latency and connection reliability are delivered by the "fullmesh" path manager, "ECF" scheduler and "bbr2" congestion control. Increasing TCP SYN retries as high as 8 but no lower than 4 seems to be the sweet spot. We've had some results where increasing the "Fullmesh subflows" to 2 has generated increase bandwidth but seemed to increase latency as well. We will be relocating our T-Mobile antenna with the hopes of improving the quality of that connection and subsequent service bandwidth and latency performance today. As we believe the weakest link of the chain in the multipath configuration significantly impacts the overall performance, we're hoping we will see improvements in our OpenMPTCProuter performance as well.

Hat's off to Ysurac for his significant and continued efforts in developing this project! For us, the 1st benefit, creating a single public ip address for internet services that has connection redundancy is already worth the effort to develop the solution. Improved latency and bandwidth aggregation will be the icing on the cake.

hle5128 commented 2 years ago

@Network-Traditions seem like a similar experience to what i found, I do have a 500Mbps cable to test, and when testing a single Cable alone, speed and latency are near speed without running through VPS, similar to another wireguard service, but if we add any 5G (tmobile home internet) max is 400Mbps at night and average 200s during the day, the aggregation of both always make speed+latency longer than just 1 of the fastest WAN.

Thank you for your detail and confirmation, I always thought there is something with my hardware or setup.

Ysurac commented 2 years ago

Packet loss on a connection is a problem for speed. Slow connections is not a problem but bad connection is always a problem. There is also, sometimes, a far different result when trying with another VPS provider. For VPS you also need to check that it have enough internet speed, this must be better than max aggregated speed.

@Network-Traditions you should update VPS

hle5128 commented 2 years ago

@Ysurac

the vps is double speed average of wan1+wan2

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
cd568f|OK  |    39MiB/s|/tmp/testdeleteme/1Gb.dat

and I switched to Intel N4020 x86_64

wan1:
ping statistics ---
30 packets transmitted, 30 received, 0% packet loss, time 29046ms
rtt min/avg/max/mdev = 65.212/77.028/149.561/15.962 ms
root@OpenMPTCProuter:~# 

wan2:
ping statistics ---
30 packets transmitted, 30 received, 0% packet loss, time 29043ms
rtt min/avg/max/mdev = 64.084/74.042/124.284/14.112 ms
root@OpenMPTCProuter:~# 

omr-test-speed wan1:
Best server is http://atl.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 13 9536M   13 1324M    0     0  10.6M      0  0:14:51  0:02:03  0:12:48 8446k^C

omr-test-speed wan2:
Best server is http://mia.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 26 9536M   26 2553M    0     0  21.2M      0  0:07:28  0:02:00  0:05:28 21.6M^C

omr-test-speed:
Best server is http://nyc.download.datapacket.com/10000mb.bin, running test:
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 30 9536M   30 2905M    0     0  24.0M      0  0:06:36  0:02:00  0:04:36 26.9M^C

ioogithub commented 2 years ago

@hle5128 have you done further testing with the x86_64 platform, has your experienced improved since moving from the rpi? Did you every figure out what the bottleneck was?

I have the same situation as you, an rpi with 2 USB dongles and I am always seeing the aggregate speed always lower than the faster WAN (starlink) alone. I have several benchmark results in the last two posts of this discussion: https://github.com/Ysurac/openmptcprouter/discussions/2353 which show this.

I am currently waiting on analysis to determine what type of hardware I should buy to try to eliminate the bottleneck but your last posted result with the new hardware doesn't look that promising. I would appreciate hearing you results.

Network-Traditions commented 2 years ago

We've just built a new x86_64 platform with openmptcprouter v0.59.1-5.4 r0+16594-ce92de8c8c and will be posting our results in the next day or so. Our prior build was openmptcprouter v0.59beta6-5.4 r0+16591-b4ea8e1089 and I do believe we are seeing an improvement in USB 5G cellular connectivity utilizing a "Modem Manager" connection. We have seen increasing speeds as well, however, we are unsure if that has been driven more by increased bandwidth from both Starlink and T-Mobile. We also believe both those providers are refining their service, which has brought their latency closer together so they work better when bonded by MPTCP. We did try to deploy v0.59.1 with the 5.15 kernel, however, we still find that both Starlink and T-Mobile cannot establish an MPTCP connection on that kernel. Again, we hope to post a detailed report of all our testing very soon..

ioogithub commented 2 years ago

Are you still seeing lower aggregate throughput on the bonded connection compared to the fastest single connection with your new setup?

Also what x86_64 CPU are you using, do you find it adequate for the bandwidth task?

Network-Traditions commented 2 years ago

For the most part, I believe "yes". That being said, this is an instinctive conclusion and there could be duty cycles where the bandwidth exceeds the currently provided bandwidth of any single service. Because the 2 services I'm bonding are both unpredictable almost by design, T-Mobile 5G and StarLink, it is quite difficult to develop a testing procedure that would deliver conclusive results. Our goal is to bond the two providers into a highly reliable single service with functional latency while achieving a respectable bandwith (~100Mbps down ~20 Mbps up). Both T-Mobile and StarLink are actively improving/modifying the services in ways that dramatically impact their performance with OpenMPTCProuter. For the most part, those changes over time have had an extremely positive impact on the performance of the router in terms of reliability, throughput and latency within the requirements of the previously stated goals. That being said, both T-Mobile and StarLink occasionaly have duty cycles of over-the-top performance and OpenMPTCProuter passes that on admirably. The longer our 5G modem stays connected to T-Mobile (restarts on its own once a day on average), it seems to deliver 5G "UC" ultra capacity connections spiking momentary downloads as high as 500 Mbps. The same is true for StarLink as high as 300 Mpbs. Our pfSense monitor will register brief upload spikes virtually as high as well, however, those usually happen at the origins of a transmission and cease immediatly delivering a much lower registered average of about 20Mbps upload with speedtest.net.

Regarding our hardware, for the v0.59beta6-5.4 version of our deployment we're using the " Intel® Celeron® Processor J4125" version of the (HUNSN FNR-RS34G) see this post for more details. For our v0.59.1-5.4 version, were using Lenovo 5100C1U SFF with an AMD A4-5300 APU and one additional Intel NIC adapter. We have yet to determine the adequacy of the hardware we're deploying for the task. That's a bit more complicated as our deployment simply uses OpenMPTCProuter to feed a virtual machine with pfSense to fulfill routing, additional firewall protections, VPN and other services. The pfSense hypervisor provides our core services for the location which include DOMAIN Controller, pfSense, Zimbra Groupware Server, NAS, VOIP, 1 Windows Workstation and a WEB services LEMP stack to include Nextcloud, ScreenConnect, Invoice Ninja and a number of websites. Subsequently, pfSense and overall hypervisor performance directly impacts OpenMPTCProuter performance and will require a detailed review once we are satisfied with the connection quality of OpenMPTCProuter with T-Mobile and StarLink.

Network-Traditions commented 2 years ago

We've completed a clean build of v0.59.1-5.4 using Lenovo 5100C1U SFF with an AMD A4-5300 APU and one additional Intel NIC adapter. This build is working per expectations and while the maximum achieved download and upload results using speedtest to servers close to our VPS does not exceed the bandwith of the best performing WAN, it is doing so across WANs. Bearing in mind we are only connected with 5G T-Mobile and Starlink, both being very unpredictable services (even more so today as the momentary maximum bandwith both up and down have increased significantly), during any paticular test 5G T-Mobile tends to have the greatest upload speed while StarLink tends to have the greatest download speed. The results of a respective test reflects the same, our download speed will be very close to the maximum logged by StarLink and the upload speed will be similarly close to the maximum speed logged by 5G T-Mobile. This is providing a promissing future for the upload side of our use case as we operate a number of server services where upload speed is very important. It would be interesting to determine how this dynamic plays out during real traffic flow as compared to independent download and upload tests of a tool like speedtest.net. We will continue testing with our real life traffic flow of UHD streaming, VOIP and other connection sensitive traffic to see how well this solution performs under various configurations of OMR in an effort to achieve our goals with completely wireless internet services.

ioogithub commented 2 years ago

OpenMPTCProuter platform: x86_64 (HUNSN FNR-RS34G https://www.hunsn.com/item/network-security-firewall/pfsense-mini-pc)

How have you found the quality of this hardware, did it meet your expectations? Do these little niche mini pc's seem adequate for this task? Were there any bottlenecks with the HUNSH such as shared bus bandwidth or low quality components?

For our v0.59.1-5.4 version, were using Lenovo 5100C1U SFF with an AMD A4-5300 APU

Why did you move to a PC with an AMD process for your net build? Did you find the Celeron J4125 to be inadequate for routing for you bandwidth requirement?

I am am also curious about your starlink connection. Do you find it to be unstable- very high speed spikes and also very low throughput? Did you adjust the defaults to compensate, for example did you activate SQM autorotate on the starlink line to compensate for buffer bloat?

Network-Traditions commented 2 years ago

How have you found the quality of this hardware, did it meet your expectations? Do these little niche mini pc's seem adequate for this task? Were there any bottlenecks with the HUNSH such as shared bus bandwidth or low quality components?

Yes, so far. One of our deployment scenarios is mobile internet with cable modem performance even outside 5G coverage. Subsequently low power, shock resistant inexpensive hardware is valuable for this goal. We also expect to push the limits of throughput and redundancy. On the redundancy front, we hope to be testing LACP bonding of 2 ports to the switch. We still have yet to establish based on the specific hardware we're using if it truly will achieve 1) redundancy [we believe the answer here will be yes] and 2) throughput enhancement using both channels simultaneously [ie our switch ports are 1G so will an LACP bond with 2 ports provide for 2 simultaneous 1G connections: not sure on this one at this time]. As far as fitness for purpose, quality, etc that as always will depend upon the deployment as well as real world evaluation. A great source of information is STH. Check out their latest video on similar devices.

Why did you move to a PC with an AMD process for your net build? Did you find the Celeron J4125 to be inadequate for routing for you bandwidth requirement?

The development location is solely served by this connection and its internet services. Subsequently, to test a new internet connection configuration we need to preserve the "production" setup, which is the HUNSH device. The Lenovo was deployed because it was available and more convenient due to USB cable length of the 5G modem over a VM deployment on one of the hypervisors. So far, hardware performance has not been an issue to restrict our testing. We've configured our pfSense to support 2 OMR connections in a fail-over configuration. When we wish to test the Lenovo configuration we simply swap the Ethernet and USB connection between the devices. pfSense immediatly fails-over and we can hit the ground running while still keeping our systems online. We did discover a blacklisting problem with the public IP address of our second VPS as a result since all our outbound emails were being dropped so we now know we have to validate the VPS addresses before deployment. Next summer we will be rewiring our configuration to allow for VM testing of OMR on 2 different hypervisors, which will make "bleeding edge" testing more reliable and convenient. As the testing of the current v0.59.1-5.4 using Lenovo 5100C1U SFF is validated, that configuration will be deployed on the HUNSH device for "production" operation validation moving forward.

I am am also curious about your starlink connection. Do you find it to be unstable- very high speed spikes and also very low throughput?

Regarding Starlink we have been quite pleased with its performance to date bearing in mind it will be the sole internet source for our beyond 5G operations. Our summer 2023 conversion project is a mobile test platform. We anticipate in-motion as well as remote location testing. Subsequently, Starlink is the only hope for that scenario. Currently at our location in the Metro-Phoenix area, Starlink has competed quite admirably with Cox business cable 100Mbps down / 20Mbps up service. Coupled with 5G T-Mobile and OMR, I believe it exceeds our prior Cox service in every metric except latency, but Cox can only be deployed at a static location where available. Starlink OMR performance with and without 5G T-Mobile at different location remains to be tested. We're also anxious to see what impact 5GUC will have on our overall performance. At our current location, the signal is to week to leverage the "Ultra Capacity" connections. It does on occassion and that's when we see huge spikes in throughput. Overall Starlink throughput remains above ~40-50Mbps and peaks about 300Mbps. Latency to the connected gateway is 30ms at a minimum and 120ms at a maximum. We have a mostly clear view of the sky so we achieve virtually no obstruction disconnects so it is quite stable. That being said, its more valuable as a symbiotic member of OMR. By itself, on a daily basis, it would have its aggravations while being workable. In the remote location scenario, those aggravations are completely acceptable since no competing service exists.

Did you adjust the defaults to compensate, for example did you activate SQM autorotate on the starlink line to compensate for buffer bloat?

This is a primary focus of our testing with the new deployment. With both services available to OMR having widly varying latencies and throughputs, it has been no easy task. Day of week and time of day congestion too are a contributing factor for this reality so OMR and pfSense configuration will hopefully contribute to the overall stability of the deployment. We have been and continue to be constantly testing and monitoring various configurations to find the best performance for our current deployment. The verdict is still out for SQM autorotate. Additionally, the same is somewhat true for testing tools. Regarding buffer bloat, we've been using Waveform's tool. It seems to be valuable for helping reduce the latency issues of a configuration, but we're not sure about it's service conclusions. Since we stream high bandwidth UHD (4K 60fps), utilize VOIP and numerous other business services, we have been using our daily operations as the actual confirmation of testing guidance. Results to date suggests copious testing will be required to truly make informed configuration decisions as there are a significant number of unknown moving variables that prohibits direct analysis. So far but far from convincing, it seems "fullmesh" is the best path-manager [to be expected per OMR recommendations]; ECF and BLEST are the best schedulers. MPTCP SYN retries seems to be best at "2", higher seems to stabilize download while aggravating upload latency. MPTCP checksum seems to provide stability while also helping buffer bloat latency [instincive comment]. Along with lower MPTCP SYN retires, keeping IPv4 TCP SYN retries, retries1 & 2 low also seem to create a quicker recovery when OMR starts to stumble. All these observations are quite preliminary and will be updated as we develop stronger conclusions.

What are you testing and what has been your experiences to date?

Network-Traditions commented 1 year ago

@ioogithub we posted more configuration feedback here: #2594

hle5128 commented 1 year ago

I gave up on this, you wont get speed combined at all regardless, (I tried x86 as well) my wan stand alone is faster when combined 2 wan, and the speed of the VPN is not a problem. Let me know what you find out.

On Sep 13, 2022, at 10:37 PM, ioogithub @.***> wrote:

@hle5128 have you done further testing with the x86_64 platform, has your experienced improved since moving from the rpi? Did you every figure out what the bottleneck was?

I have the same situation as you, an rpi with 2 USB dongles and I am always seeing the aggregate speed always lower than the faster WAN (starlink) alone. I have several benchmark results in the last two posts of this discussion: #2353 which show this.

I am currently waiting on analysis to determine what type of hardware I should buy to try to eliminate the bottleneck but your last posted result with the new hardware doesn't look that promising. I would appreciate hearing you results.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.

Network-Traditions commented 1 year ago

@hle5128 Due to issue #2583, we were forced to switch to ShadowSocks as our default proxy instead of V2Ray on OMR. While not empirically confirmed, we felt V2Ray was faster and port forwarding was better. Switching to ShadowSocks has either changed or triggered review of the findings in discussion #2594. In either case, we've begun to realize that OMR has been aggregating bandwidth beyond any instantaneous speed of the individual services contrary to our initial anecdotal conclusions. That being said, when using a service "stand alone", said service seems to often provide superior throughput over the aggregated performance. Our insticts are that as the performance diversity of the aggregated services increases, the aggregated throughput and latency of aforementioned services degrades.

We have more distance and obstructions than desirable at our current testing location for our T-Mobile static IP "Business Internet" side of our aggregataion and subsequently suffer from dramatic variations of performance. WIth our location and typcial signal strength, our system often cannot obtain a "UC" (Ultra Capacity) connection. That being said, sometimes it does, which contributes to it's dramatic variation of performance. When tested as the "master" and sole aggregated service through OMR, typically without "UC" we see 20-50Mbps download with 10-20Mbps upload. When "UC" kicks in, we see up to +550Mbps download and +50Mbps upload. When aggregated with Starlink, we sometimes see "UC" level of performance in the bonded throughput. We expect this will become more common when we eventually have the opportunity to test our setup with a strong "UC" signal.

We suspect latency is the bigger key to OMR aggregated performance. That's not only a common theme here on OpenMPTCProuter's github, but also confirmed by our current tweaking of SQM QoS. We've been adjusting the settings to achieve consistent latency between StarLink and T-Mobile. As we've brought their performance together as reported by the OMR "Status" page of the "System" menu, we've seen significant quality and user experience gains with moderate peak throughput gains. With the given circumstances of our T-Mobile connection, by comparison with StarLink, T-mobile excels at upload while Starlink excels at download. Subsequently, our SQM Qos "autorate" settings in the "Network" menu reflect that reality and has kept latency deviation between the services in a similar loaded range that we believe is responsible for our improved reliability and performance.

Finally, reviewing the image in your post shows two T-Mobile services being aggregated. There are two issues to consider here:

Even though it is likely the hardware and configuration of each service are either identical or quite similar, because of the highly variable quality and throughput variations of T-Mobile's as well as cellular services in general, service diversity could still be a significant issue for you.
Because there are two T-Mobile WANs, they may be utilizing some or all of the same resources from the T-Mobile system. It hadn't occurred to us before, until we saw mentions here on github, that using more than one connection to the same cellular service very likely would degrade the throughput of the individual services. Therefore, it is quite possible when testing a single connection, it would often if not always perform far better than two that are close enough to impact each other regardless of the enpoint connection. If I had to test this therory, I would setup one connection as the "master" aggregated with OMR and connect the other directly to a device that could conduct the same synthetic speed and quality testing. Then I would test both independently with the other completely powered off. Then I would invert the layout of the devices between OMR and the independent device to eliminate hardware differentiation. If there is any decrease in individual performance while operating simultaneously, that would highly implicate this as an issue.

I hope our feedback has been helpful to you as well as this community. Take care!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days