Open richb-hanover opened 10 months ago
It is so cool to be looking at a new plotting mechanism. I do not quite know what I am looking at. The second plot should not be occilating like that in the 2nd and 3rd phases, the first has a puzzling latency change in the first phase. Perhaps some of the behavior could be explained by the tool, others by the underlying link.
What is the aqm on the link? ecn on or off? What is the underlying tech?
For simplicity, a single flow would be easier to look at, rather than 16.
A staggered start test, also of 2 flows.
what does a 16 flow rrul test look like?
OK. I'll have to replicate, then run a Flent test...
Using the same test configuration as https://github.com/Zoxc/crusader/issues/9#issuecomment-1890951312, (Macbook on Wi-Fi to Ubuntu on Ethernet) I ran both Flent and Crusader. I got these plots:
Flent 2.1.1
Crusader 0.0.10
Crusader settings (default, I think)
boy are those two different. Crusader with 4 flows might be directly comparable. But I strongly suspect we have a way to go to make crusader drive the test (s), and it might have some internal bottlenecks, like using green rather than real threads (or in rust parlance, async vs threads)...
Anyway rrul_be (not the rrul test) vs 4 crusader flows should be roughly the same in the third segment of the crusader test. Thanks for calibrating!!!!
I am going to try and ramp up on crusader related testing in the coming months. I still do not understand these results. Your crusader test in the third panel is observing mere 100ms peaks while the rrul test observes 200+ms peaks. 4 flows compared to 4 flows might be revealing
Another data point: I ran both Flent and Crusader between a Mac mini and an Odroid C4. They were connected via Ethernet through the LAN port of a (venerable) WNDR3800 running stock OpenWrt and no VLANs.
The plots look quite similar: both show high speed data with relatively little increase in latency during the test. I also attach the corresponding data files below.
That is quite promising. I am puzzled by the spikes at t+15.5 seconds. The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability. It does not capture tcp rtt stats natively and I wish I knew enough about rust to make that syscall and plot that.
Server: Debian 12 VM under Proxmox PVE 8.0 (Intel N100 Mini PC), connected to Asus RT-AX86U router LAN port
Client: Mac Mini M1 2020 running latest macOS 14.4.1 and up-to-date Homebrew, wireless connection to Asus RT-AX86U router
Flent:
(py310venv_universal) mcuee@mcuees-Mac-mini python % flent rrul -p all_scaled -l 60 -H 192.168.50.80 -t flent_macos -o macos_wireless_asus.png
Starting Flent 2.1.1 using Python 3.10.9.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-29T193653.612449.flent_macos.flent.gz
Initialised matplotlib v3.8.3 on numpy v1.26.4.
WARNING: Unable to build our own tight layout: 'Figure' object has no attribute '_cachedRenderer'
Crusader:
(py310venv_universal) mcuee@mcuees-Mac-mini crusader-aarch64-apple-darwin % ./crusader test 192.168.50.80
Connected to server 192.168.50.80:35481
Latency to server 3.52 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.29 19-43-07.crr
Saved plot as plot 2024.03.29 19-43-07.png
iperf3 result for reference:
(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80 -R
Connecting to host 192.168.50.80, port 5201
Reverse mode, remote host 192.168.50.80 is sending
[ 5] local 192.168.50.29 port 49984 connected to 192.168.50.80 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 68.0 MBytes 568 Mbits/sec
[ 5] 1.00-2.00 sec 87.0 MBytes 733 Mbits/sec
[ 5] 2.00-3.00 sec 84.5 MBytes 706 Mbits/sec
[ 5] 3.00-4.00 sec 88.0 MBytes 738 Mbits/sec
[ 5] 4.00-5.01 sec 86.6 MBytes 726 Mbits/sec
[ 5] 5.01-6.00 sec 87.9 MBytes 737 Mbits/sec
[ 5] 6.00-7.00 sec 86.4 MBytes 726 Mbits/sec
[ 5] 7.00-8.00 sec 84.9 MBytes 712 Mbits/sec
[ 5] 8.00-9.00 sec 88.8 MBytes 745 Mbits/sec
[ 5] 9.00-10.00 sec 81.2 MBytes 681 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 846 MBytes 709 Mbits/sec 114 sender
[ 5] 0.00-10.00 sec 843 MBytes 707 Mbits/sec receiver
iperf Done.
(py310venv_universal) mcuee@mcuees-Mac-mini python % iperf3 -c 192.168.50.80
Connecting to host 192.168.50.80, port 5201
[ 5] local 192.168.50.29 port 49986 connected to 192.168.50.80 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 55.9 MBytes 468 Mbits/sec
[ 5] 1.00-2.00 sec 51.8 MBytes 433 Mbits/sec
[ 5] 2.00-3.00 sec 57.1 MBytes 480 Mbits/sec
[ 5] 3.00-4.00 sec 70.5 MBytes 593 Mbits/sec
[ 5] 4.00-5.00 sec 72.4 MBytes 606 Mbits/sec
[ 5] 5.00-6.00 sec 73.8 MBytes 616 Mbits/sec
[ 5] 6.00-7.00 sec 78.4 MBytes 660 Mbits/sec
[ 5] 7.00-8.00 sec 75.2 MBytes 629 Mbits/sec
[ 5] 8.00-9.00 sec 76.8 MBytes 643 Mbits/sec
[ 5] 9.00-10.00 sec 56.0 MBytes 471 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 668 MBytes 560 Mbits/sec sender
[ 5] 0.00-10.02 sec 665 MBytes 557 Mbits/sec receiver
iperf Done.
@dtaht
BTW, just wondering if you can help fixing singpore.starlink.taht.net
. Thanks. I am located in Singapore.
https://blog.cerowrt.org/post/flent_fleet/
mcuee@debian12vmn100new:~/build$ ping -4 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known
mcuee@debian12vmn100new:~/build$ ping -6 -c 4 singpore.starlink.taht.net
ping: singpore.starlink.taht.net: Name or service not known
@richb-hanover
It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net)
has been down for quite a while.
https://flent.org/intro.html#quick-start
https://blog.cerowrt.org/post/flent_fleet/
Just wondering if it is possible to revive the server, or probably update the flent.org website.
@dtaht Just wondering if it is possible to host some test crusader server along with flent as well. Thanks.
Using my own Crusader server over the internet to countercheck Waveform.com test results..
May not be a good example but Crusader is much better than Waveform.com since the Speed is on the low side and I will have doubts about the validity of the result.
Test server: Ubuntu 22.04 LxC container on an Intel N100 Mini PC running Proxmox PVE 8.0 (quad 2.5G ports), The Mini PC is connected to Asus RT-AX86U router 2.5G LAN port. 1Gbps Fibre Internet.
Test client: Acer Windows 11 laptop and Ugreen USB 3 to 2.5G adapter, connected to OpenWRT virtual router 2.5G LAN port.
BTW, I have not been able to test flent using my own server over the internet yet. My two home networks actually share the same upstream GPON ONT so that I can only test upload or download and not both, if I need to test over the internet.
1) Without SQM it is already good.
Waveform.com bufferbloat test result: A https://www.waveform.com/tools/bufferbloat?test-id=a1968217-f78a-456c-acae-217bd38ed00e
2) With SQM (Cake, 1Gbps download limit, 200Mbps upload limit)
Waveform.com bufferbloat test result: A+ https://www.waveform.com/tools/bufferbloat?test-id=da64d035-23be-4e6a-a54e-39d0d9b28d47
OpenWRT 23.05 SQM settings: Queueing discipline: cake Queue setup script: piece_of_cake.qos
Crusader vs Flent (internal OpenWRT WAN side server and LAN side client). You can see that crusader seems to be able to catch up with virtual network adapter (10Gbps) whereas flent can not cope.
Server: Ubuntu 22.04 LXC container (192.168.50.15) on Intel N100 mini PC running Proxmox PVE 8.0 Client: Ubuntu 22.04 VM (192.168.48.9) on Intel N100 mini PC running Proxmox PVE 8.0 OpenWRT 23.05 virtual router on Intel N100 mini PC running Proxmox PVE 8.0 OpenWRT 23.05 virtual router WAN -- 192.168.50.134 OpenWRT 23.05 virtual rotuer LAN -- 192.168.48.1 No SQM/QoS settings enabled.
mcuee@ubuntu2204vmbr0:~/build/crusader$ ./crusader test 192.168.50.15
Connected to server 192.168.50.15:35481
Latency to server 0.42 ms
Testing download...
Testing upload...
Testing both download and upload...
Writing data...
Saved raw data as data 2024.03.30 21-03-01.crr
mcuee@ubuntu2204vmbr0:~/build/crusader$ flent rrul -p all_scaled -l 60 -H 192.168.50.15 -t openwrt_lan_client_wan_server_flent -o openwrt_flent_wan_lan.png
Starting Flent 2.0.1 using Python 3.10.12.
Starting rrul test. Expected run time: 70 seconds.
Data file written to ./rrul-2024-03-30T210556.265357.openwrt_lan_client_wan_server_flent.flent.gz
Initialised matplotlib v3.5.1 on numpy v1.21.5.
It seems to me that the server netperf.bufferbloat.net (also called netperf-east.bufferbloat.net) has been down for quite a while.
Yes. I have been stymied by heavy abuse of the server. In addition to legitimate researchers or occasional users, I see people running a speed test every five minutes, 24x7.
I created a bunch of scripts to review the netperf server logs and use iptables to shut off people who abuse the server. Even with those scripts running, I have been unable to keep the traffic sent/received below the 4TB/month cap at my VPS.
I'm feeling pretty discouraged... I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.
I am going to ask about this on the Bloat mailing list to see if anyone has ideas. Thanks.
Reference discussion here: https://lists.bufferbloat.net/pipermail/bloat/2024-March/017987.html
@Zoxc
I tend to think I find a good use of crusader here. Still just wondering if you have some ideas how to test the effectiveness of cake-autorate
better. Thanks.
In the end cake-autorate is not suitable for my use case. But still crusader proves to be a good tool during the testing.
I note that I am very behind on github...
The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.
That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.
The way crusader works is by multiplexing a few connections through the rust async subsystem (I think), which might be leading to that sort of variability.
That could probably be tested by running a separate crusader client and server instance only measuring latency and see if it reproduces that.
Looks like this has been implemented, at least for the GUI version.
[Not really a crusader bug report]
In https://github.com/Zoxc/crusader/issues/6#issuecomment-1886643966, @dtaht wrote:
Dave: Could you explain what you're seeing in this plot? What do you see there? What did you expect?
Also: And I just created the second plot. Any surprises there? Many thanks.
From #6
From richb-hanover test