coova / coova-chilli

CoovaChilli is an open-source software access controller for captive portal hotspots.
Other
512 stars 258 forks source link

Poor upload speed on ESXi #579

Closed alex-wifigem closed 4 months ago

alex-wifigem commented 4 months ago

I have installed Coova-chilli 1.6 on a Ubuntu server 20.04 LTS on a ESXi virtual machine. The virtual machine is configured with 2 network adapters and the whole system works. Client devices on the secondary network adapter receive an IP address, can login and can access the Internet. The problem is that the upload speed is close to zero. The same configuration works perfectly on Vmware workstation, on Hyper-V and native on Ubuntu. The speed test has been performed between the machine where Coova-chilli is installed and a Windows laptop directly connected to tun0 with an Ethernet cable. No other devices are connected, no firewalls are installed, iptables rules allow all traffic, no traffic shaping is in place. Speed has been tested with iperf3. Here is the download test (from Ubuntu with Coova-chilli to Windows machine):

# iperf3 -s -B 10.1.0.1
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.1.0.2, port 62432
[  5] local 10.1.0.1 port 5201 connected to 10.1.0.2 port 62433
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  55.5 MBytes   466 Mbits/sec   20    165 KBytes       
[  5]   1.00-2.00   sec  58.3 MBytes   489 Mbits/sec   14    118 KBytes       
[  5]   2.00-3.00   sec  57.5 MBytes   482 Mbits/sec    7    133 KBytes       
[  5]   3.00-4.00   sec  57.5 MBytes   482 Mbits/sec    7    148 KBytes       
[  5]   4.00-5.00   sec  57.8 MBytes   485 Mbits/sec    9    114 KBytes       
[  5]   5.00-6.00   sec  57.6 MBytes   483 Mbits/sec    7    135 KBytes       
[  5]   6.00-7.00   sec  58.8 MBytes   493 Mbits/sec    8    111 KBytes       
[  5]   7.00-8.00   sec  57.6 MBytes   484 Mbits/sec    7    138 KBytes       
[  5]   8.00-9.00   sec  57.8 MBytes   485 Mbits/sec    8    107 KBytes       
[  5]   9.00-10.00  sec  57.6 MBytes   483 Mbits/sec    7    107 KBytes       
[  5]  10.00-10.04  sec  2.27 MBytes   463 Mbits/sec    0    135 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.04  sec   578 MBytes   483 Mbits/sec   94             sender

And here is the output from the Upload test (the server is receiving):

-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.1.0.2, port 62465
[  5] local 10.1.0.1 port 5201 connected to 10.1.0.2 port 62466
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.43 KBytes  11.7 Kbits/sec                  
[  5]   1.00-2.00   sec  1.43 KBytes  11.7 Kbits/sec                  
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   3.00-4.00   sec  1.43 KBytes  11.7 Kbits/sec                  
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec                  
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.06  sec  4.28 KBytes  3.48 Kbits/sec                  receiver

Any idea whether I'm doing something wrong or if there is another factor causing the issue? Thank you

sevan commented 4 months ago

I am not sure but wondering what emulated network card you are using for the guest on ESXi & workstation? (are they the same)

alex-wifigem commented 4 months ago

On the Physical NIC setup, the driver is "tg3", but I'm not sure if the problem is in the ESXi setup (I wish it was so). In fact, I made another test bypassing Coova-chilli:

# iperf3 -s -B 10.1.0.1
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.1.0.2, port 59968
[  5] local 10.1.0.1 port 5201 connected to 10.1.0.2 port 59969
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   108 MBytes   906 Mbits/sec                  
[  5]   1.00-2.00   sec   112 MBytes   940 Mbits/sec                  
[  5]   2.00-3.00   sec   113 MBytes   944 Mbits/sec                  
[  5]   3.00-4.00   sec   113 MBytes   944 Mbits/sec                  
[  5]   4.00-5.00   sec   113 MBytes   944 Mbits/sec                  
[  5]   5.00-6.00   sec   112 MBytes   936 Mbits/sec                  
[  5]   6.00-7.00   sec   112 MBytes   942 Mbits/sec                  
[  5]   7.00-8.00   sec   112 MBytes   937 Mbits/sec                  
[  5]   8.00-9.00   sec   112 MBytes   944 Mbits/sec                  
[  5]   9.00-10.00  sec   113 MBytes   945 Mbits/sec                  
[  5]  10.00-10.04  sec  4.71 MBytes   944 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.04  sec  1.10 GBytes   938 Mbits/sec                  receiver
sevan commented 4 months ago

Just to double check, the emulated network card attached to the guest is a virtual Broadcom NIC using the tg3 driver? (check the VM settings & compare with the VM settings on the workstation, the tun interface would bind to the emulated virtual nic)

alex-wifigem commented 4 months ago

On thie ESXi machine, the card attached to the guest is Broadcom (MAC address 00:10:18:89:d3:3d) using the tg3 driver. VMware workstation is installed on another computer with different adapters. There, I have a Marvell Yukon as second physical adapter, bridged to network VMnet2, that is connected to the Vmware network adapter 2, that is bound to tun0. But the problem is not in the difference between the 2 virtualizers, because I have many other installations with VMware, on different hardware, and all of them don't have any throughput issues. instead all the ESXi installations have the same issue, regardless of the hardware.

sevan commented 4 months ago

Where I was going was not that the issue is in the difference in hardware but the emulated network card that is used with guest. I don't think I can help, so I'll be quiet and hopefully someone can step in.

alex-wifigem commented 4 months ago

In the Edit Settings of the VM, the Adapter Type is E1000. On boot this is the content of dmesg:

[   63.636132] e1000: ens37 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[   63.720751] device ens37 entered promiscuous mode
[   63.723680] e1000: ens37 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

According to Vmware: E1000 Emulated version of the Intel 82545EM Gigabit Ethernet NIC, with drivers available in most newer guest operating systems, including Windows XP and later and Linux versions 2.4.19 and later.

nzamps commented 4 months ago

Sounds like you need to disable gso and tso:

ethtool -K ens37 gso off gro off tso off

where ens37 is your WAN.

- Brian

On 24 Mar 2024, at 11:08, alex-wifigem @.***> wrote:

In the Edit Settings of the VM, the Adapter Type is E1000. On boot this is the content of dmesg:

[ 63.636132] e1000: ens37 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 63.720751] device ens37 entered promiscuous mode [ 63.723680] e1000: ens37 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None In ubuntu, the adapter is ens37

— Reply to this email directly, view it on GitHub https://github.com/coova/coova-chilli/issues/579#issuecomment-2016615150, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSN7B7S2D7R2Q6N2IWYE23YZX4MPAVCNFSM6AAAAABFEZKRLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGYYTKMJVGA. You are receiving this because you are subscribed to this thread.

alex-wifigem commented 4 months ago

Hi Brian thank you, but your advise did not help to improve the performance. If I bypass tun0, throughput is almost 1 Gbit/s, that's why on my opinion, the problem is not on the physical adapter or type of emulation.

nzamps commented 4 months ago

You should experiment with setting those values on other interfaces being used as that is the exact symptom of having GRO/TSO enabled.

On 24 Mar 2024, at 11:43, alex-wifigem @.***> wrote:

Hi Brian thank you, but your advise did not help to improve the performance. If I bypass tun0, throughput is almost 1 Gbit/s, that's why on my opinion, the problem is not on the physical adapter or type of emulation.

— Reply to this email directly, view it on GitHub https://github.com/coova/coova-chilli/issues/579#issuecomment-2016621994, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABSN7B4CXPF2OF2FRIXNFYTYZYAQ7AVCNFSM6AAAAABFEZKRLOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGYZDCOJZGQ. You are receiving this because you commented.

alex-wifigem commented 4 months ago

No luck after setting the same on tun0 too. The other interface is not involved in the test. This is the output of ethtool -k ens37

Features for ens37:
rx-checksumming: off
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off [fixed]
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
alex-wifigem commented 4 months ago

It seems that something has changed. I had to ssh to ESXi and do the ethtool there on the physical adapter:

[root@esxi:~] ethtool -K vmnic1 rx off
[root@esxi:~] ethtool -K vmnic1 lro off
no offload settings changed
[root@esxi:~] ethtool -K vmnic1 gro off
no offload settings changed
[root@esxi:~] ethtool -K vmnic1 tso off
Cannot set device tcp segmentation offload settings: Function not implemented

Then, in the VM, with Coova-chilli not running:

# ethtool -K ens37 gro off 
# ethtool -K ens37 tso off  
# ethtool -K ens37 lro off
# ethtool -K ens37 rx off 

Now, ethtool -k ens37 gives the same output as my previous post, with this only difference: generic-segmentation-offload: on Now I have 250 Mbit/s upload and 500 Mbit/s download. I will repeat this test after a reboot, and then on a clean machine. @nzamps, @sevan, thank you for your help.