Zoxc / crusader

A network throughput and latency tester.
Apache License 2.0
116 stars 8 forks source link

Comparing Crusader plots to Flent #14

Open richb-hanover opened 10 months ago

richb-hanover commented 10 months ago

[Not really a crusader bug report]

In https://github.com/Zoxc/crusader/issues/6#issuecomment-1886643966, @dtaht wrote:

And see how weird the up+down test is? What's the link?

Dave: Could you explain what you're seeing in this plot? What do you see there? What did you expect?

Also: And I just created the second plot. Any surprises there? Many thanks.

From #6 image

From richb-hanover test plot 2024 01 11 07-58-58

dtaht commented 10 months ago

It is so cool to be looking at a new plotting mechanism. I do not quite know what I am looking at. The second plot should not be occilating like that in the 2nd and 3rd phases, the first has a puzzling latency change in the first phase. Perhaps some of the behavior could be explained by the tool, others by the underlying link.

What is the aqm on the link? ecn on or off? What is the underlying tech?

For simplicity, a single flow would be easier to look at, rather than 16.

A staggered start test, also of 2 flows.

what does a 16 flow rrul test look like?

richb-hanover commented 10 months ago

OK. I'll have to replicate, then run a Flent test...

richb-hanover commented 10 months ago

Using the same test configuration as https://github.com/Zoxc/crusader/issues/9#issuecomment-1890951312, (Macbook on Wi-Fi to Ubuntu on Ethernet) I ran both Flent and Crusader. I got these plots:

Flent 2.1.1

image

Crusader 0.0.10

image

Crusader settings (default, I think)

image



dtaht commented 10 months ago

boy are those two different. Crusader with 4 flows might be directly comparable. But I strongly suspect we have a way to go to make crusader drive the test (s), and it might have some internal bottlenecks, like using green rather than real threads (or in rust parlance, async vs threads)...

Anyway rrul_be (not the rrul test) vs 4 crusader flows should be roughly the same in the third segment of the crusader test. Thanks for calibrating!!!!

dtaht commented 8 months ago

I am going to try and ramp up on crusader related testing in the coming months. I still do not understand these results. Your crusader test in the third panel is observing mere 100ms peaks while the rrul test observes 200+ms peaks. 4 flows compared to 4 flows might be revealing

richb-hanover commented 8 months ago

Another data point: I ran both Flent and Crusader between a Mac mini and an Odroid C4. They were connected via Ethernet through the LAN port of a (venerable) WNDR3800 running stock OpenWrt and no VLANs.

The plots look quite similar: both show high speed data with relatively little increase in latency during the test. I also attach the corresponding data files below.

rrul_-_RRUL_to_Odroid-2

plot 2024 03 06 10-59-21

rrul-2024-03-06T105425.795316.RRUL_to_Odroid-2.flent.gz

data 2024.03.06 10-59-26.crr.zip

dtaht commented 8 months ago

That is quite promising. I am puzzled by the spikes at t+15.5 seconds. The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability. It does not capture tcp rtt stats natively and I wish I knew enough about rust to make that syscall and plot that.

mcuee commented 7 months ago

Server: Debian 12 VM under Proxmox PVE 8.0 (Intel N100 Mini PC), connected to Asus RT-AX86U router LAN port

Client: Mac Mini M1 2020 running latest macOS 14.4.1 and up-to-date Homebrew, wireless connection to Asus RT-AX86U router

Flent:

(py310venv_universal) mcuee@mcuees-Mac-mini python % flent rrul -p all_scaled -l 60 -H 192.168.50.80 -t flent_macos -o macos_wireless_asus.png
Starting Flent 2.1.1 using Python 3.10.9.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-29T193653.612449.flent_macos.flent.gz
Initialised matplotlib v3.8.3 on numpy v1.26.4.
WARNING: Unable to build our own tight layout: 'Figure' object has no attribute '_cachedRenderer'

macos_wireless_asus

Crusader:

(py310venv_universal) mcuee@mcuees-Mac-mini crusader-aarch64-apple-darwin % ./crusader test 192.168.50.80
Connected to server 192.168.50.80:35481
Latency to server 3.52 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.29 19-43-07.crr
Saved plot as plot 2024.03.29 19-43-07.png

plot 2024 03 29 19-43-07

iperf3 result for reference:

(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80 -R
Connecting to host 192.168.50.80, port 5201
Reverse mode, remote host 192.168.50.80 is sending
[  5] local 192.168.50.29 port 49984 connected to 192.168.50.80 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  68.0 MBytes   568 Mbits/sec                  
[  5]   1.00-2.00   sec  87.0 MBytes   733 Mbits/sec                  
[  5]   2.00-3.00   sec  84.5 MBytes   706 Mbits/sec                  
[  5]   3.00-4.00   sec  88.0 MBytes   738 Mbits/sec                  
[  5]   4.00-5.01   sec  86.6 MBytes   726 Mbits/sec                  
[  5]   5.01-6.00   sec  87.9 MBytes   737 Mbits/sec                  
[  5]   6.00-7.00   sec  86.4 MBytes   726 Mbits/sec                  
[  5]   7.00-8.00   sec  84.9 MBytes   712 Mbits/sec                  
[  5]   8.00-9.00   sec  88.8 MBytes   745 Mbits/sec                  
[  5]   9.00-10.00  sec  81.2 MBytes   681 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.01  sec   846 MBytes   709 Mbits/sec  114             sender
[  5]   0.00-10.00  sec   843 MBytes   707 Mbits/sec                  receiver

iperf Done.
(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80   
Connecting to host 192.168.50.80, port 5201
[  5] local 192.168.50.29 port 49986 connected to 192.168.50.80 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  55.9 MBytes   468 Mbits/sec                  
[  5]   1.00-2.00   sec  51.8 MBytes   433 Mbits/sec                  
[  5]   2.00-3.00   sec  57.1 MBytes   480 Mbits/sec                  
[  5]   3.00-4.00   sec  70.5 MBytes   593 Mbits/sec                  
[  5]   4.00-5.00   sec  72.4 MBytes   606 Mbits/sec                  
[  5]   5.00-6.00   sec  73.8 MBytes   616 Mbits/sec                  
[  5]   6.00-7.00   sec  78.4 MBytes   660 Mbits/sec                  
[  5]   7.00-8.00   sec  75.2 MBytes   629 Mbits/sec                  
[  5]   8.00-9.00   sec  76.8 MBytes   643 Mbits/sec                  
[  5]   9.00-10.00  sec  56.0 MBytes   471 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   668 MBytes   560 Mbits/sec                  sender
[  5]   0.00-10.02  sec   665 MBytes   557 Mbits/sec                  receiver

iperf Done.
mcuee commented 7 months ago

@dtaht

BTW, just wondering if you can help fixing singpore.starlink.taht.net. Thanks. I am located in Singapore. https://blog.cerowrt.org/post/flent_fleet/

mcuee@debian12vmn100new:~/build$ ping -4 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known
mcuee@debian12vmn100new:~/build$ ping -6 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known
mcuee commented 7 months ago

@richb-hanover

It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net) has been down for quite a while. https://flent.org/intro.html#quick-start https://blog.cerowrt.org/post/flent_fleet/

Just wondering if it is possible to revive the server, or probably update the flent.org website.

mcuee commented 7 months ago

@dtaht Just wondering if it is possible to host some test crusader server along with flent as well. Thanks.

mcuee commented 7 months ago

Using my own Crusader server over the internet to countercheck Waveform.com test results..

May not be a good example but Crusader is much better than Waveform.com since the Speed is on the low side and I will have doubts about the validity of the result.

Test server: Ubuntu 22.04 LxC container on an Intel N100 Mini PC running Proxmox PVE 8.0 (quad 2.5G ports), The Mini PC is connected to Asus RT-AX86U router 2.5G LAN port. 1Gbps Fibre Internet.

Test client: Acer Windows 11 laptop and Ugreen USB 3 to 2.5G adapter, connected to OpenWRT virtual router 2.5G LAN port.

BTW, I have not been able to test flent using my own server over the internet yet. My two home networks actually share the same upstream GPON ONT so that I can only test upload or download and not both, if I need to test over the internet.

1) Without SQM it is already good. plot 2024 03 30 20-19-11 plot 2024 03 30 20-20-24

Waveform.com bufferbloat test result: A https://www.waveform.com/tools/bufferbloat?test-id=a1968217-f78a-456c-acae-217bd38ed00e

2) With SQM (Cake, 1Gbps download limit, 200Mbps upload limit) plot 2024 03 30 20-17-38 plot 2024 03 30 20-18-13

Waveform.com bufferbloat test result: A+ https://www.waveform.com/tools/bufferbloat?test-id=da64d035-23be-4e6a-a54e-39d0d9b28d47

OpenWRT 23.05 SQM settings: Queueing discipline: cake Queue setup script: piece_of_cake.qos Screenshot 2024-03-30 195433

mcuee commented 7 months ago

Crusader vs Flent (internal OpenWRT WAN side server and LAN side client). You can see that crusader seems to be able to catch up with virtual network adapter (10Gbps) whereas flent can not cope.

Server: Ubuntu 22.04 LXC container (192.168.50.15) on Intel N100 mini PC running Proxmox PVE 8.0 Client: Ubuntu 22.04 VM (192.168.48.9) on Intel N100 mini PC running Proxmox PVE 8.0 OpenWRT 23.05 virtual router on Intel N100 mini PC running Proxmox PVE 8.0 OpenWRT 23.05 virtual router WAN -- 192.168.50.134 OpenWRT 23.05 virtual rotuer LAN -- 192.168.48.1 No SQM/QoS settings enabled.

mcuee@ubuntu2204vmbr0:~/build/crusader$ ./crusader test 192.168.50.15
Connected to server 192.168.50.15:35481
Latency to server 0.42 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.30 21-03-01.crr

mcuee@ubuntu2204vmbr0:~/build/crusader$ flent rrul -p all_scaled -l 60 -H 192.168.50.15 -t openwrt_lan_client_wan_server_flent -o openwrt_flent_wan_lan.png 
Starting Flent 2.0.1 using Python 3.10.12.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-30T210556.265357.openwrt_lan_client_wan_server_flent.flent.gz
Initialised matplotlib v3.5.1 on numpy v1.21.5.

openwrt_flent_wan_lan plot 2024 03 30 21-03-01

richb-hanover commented 7 months ago

It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net) has been down for quite a while.

Yes. I have been stymied by heavy abuse of the server. In addition to legitimate researchers or occasional users, I see people running a speed test every five minutes, 24x7.

I created a bunch of scripts to review the netperf server logs and use iptables to shut off people who abuse the server. Even with those scripts running, I have been unable to keep the traffic sent/received below the 4TB/month cap at my VPS.

I'm feeling pretty discouraged... I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.

mcuee commented 7 months ago

I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.

Reference discussion here: https://lists.bufferbloat.net/pipermail/bloat/2024-March/017987.html

mcuee commented 7 months ago

@Zoxc

I tend to think I find a good use of crusader here. Still just wondering if you have some ideas how to test the effectiveness of cake-autorate better. Thanks.

mcuee commented 7 months ago

In the end cake-autorate is not suitable for my use case. But still crusader proves to be a good tool during the testing.

dtaht commented 3 months ago

I note that I am very behind on github...

Zoxc commented 3 months ago

The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.

That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.

mcuee commented 1 month ago

The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.

That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.

Looks like this has been implemented, at least for the GUI version.