Microchip-Ethernet / EVB-KSZ9477

Repository for using Microchip EVB-KSZ9477 board. Product Supported: KSZ9477, KSZ9567, KSZ9897, KSZ9896, KSZ8567, KSZ8565, KSZ9893, KSZ9563, KSZ8563, LAN9646, Phys(KSZ9031/9131, LAN8770
76 stars 78 forks source link

KSZ DSA driver transmission speed issue #86

Open darshankiran opened 2 years ago

darshankiran commented 2 years ago

We are using KSZ9477 DSA driver , facing issue while transmitting data. I.e Bitrate not going above 2Mbits/sec. This issue is not seen while receiving/downloading .

We tried loading the spi_ksz9877 driver and we are getting proper bandwidth/bitrate . issue is seen only in dsa driver.

Here is the iperf logs:

iperf3 -c 10.0.0.187 Connecting to host 10.0.0.187, port 5201 [ 5] local 10.0.0.25 port 51030 connected to 10.0.0.187 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 38.5 MBytes 323 Mbits/sec 0 1.53 MBytes
[ 5] 1.00-2.00 sec 33.8 MBytes 283 Mbits/sec 0 1.86 MBytes
[ 5] 2.00-3.00 sec 37.5 MBytes 315 Mbits/sec 0 2.32 MBytes
[ 5] 3.00-4.00 sec 33.8 MBytes 283 Mbits/sec 0 2.45 MBytes
[ 5] 4.00-5.00 sec 32.5 MBytes 273 Mbits/sec 0 2.57 MBytes
[ 5] 5.00-6.00 sec 35.0 MBytes 294 Mbits/sec 0 2.72 MBytes
[ 5] 6.00-7.00 sec 53.8 MBytes 451 Mbits/sec 0 3.02 MBytes
[ 5] 7.00-8.00 sec 35.0 MBytes 294 Mbits/sec 0 3.02 MBytes
[ 5] 8.00-9.00 sec 33.8 MBytes 283 Mbits/sec 0 3.02 MBytes
[ 5] 9.00-10.00 sec 36.2 MBytes 304 Mbits/sec 0 3.02 MBytes


[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 370 MBytes 310 Mbits/sec 0 sender [ 5] 0.00-10.00 sec 369 MBytes 309 Mbits/sec receiver

iperf Done.

iperf3 -c 10.0.0.187 -R Connecting to host 10.0.0.187, port 5201 Reverse mode, remote host 10.0.0.187 is sending [ 5] local 10.0.0.25 port 51034 connected to 10.0.0.187 port 5201 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 277 KBytes 2.27 Mbits/sec
[ 5] 1.00-2.00 sec 264 KBytes 2.17 Mbits/sec
[ 5] 2.00-3.00 sec 243 KBytes 1.99 Mbits/sec
[ 5] 3.00-4.00 sec 212 KBytes 1.74 Mbits/sec
[ 5] 4.00-5.00 sec 249 KBytes 2.04 Mbits/sec
[ 5] 5.00-6.00 sec 246 KBytes 2.02 Mbits/sec
[ 5] 6.00-7.00 sec 320 KBytes 2.62 Mbits/sec
[ 5] 7.00-8.00 sec 291 KBytes 2.39 Mbits/sec
[ 5] 8.00-9.00 sec 273 KBytes 2.24 Mbits/sec
[ 5] 9.00-10.00 sec 160 KBytes 1.31 Mbits/sec


[ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 2.57 MBytes 2.16 Mbits/sec 348 sender [ 5] 0.00-10.00 sec 2.48 MBytes 2.08 Mbits/sec receiver

iperf Done.

Is this issue related to driver?

Thanks Darshan

ewhac commented 2 years ago

You are not alone; we've noticed this, too.

triha2work commented 2 years ago

DSA driver requires no MAC driver change. But there is a compromise that MAC acceleration features cannot be used as a tag needs to be added at the end of the frame. That means no hardware checksum generation and scatter/gather transmit operation. By itself this slow downs transmit operation but not much. However older kernels have this weird problem such that basic TCP transmission drastically impacts transmit performance. Newer kernels like 5.10 seems to correct this problem. This is a software or kernel problem as the MAC driver can advertise it can support hardware acceleration but does everything in software before sending the frame and this will increase performance to expected level. Now SAMA5D3 used in the KSZ9477 evaluation board has another issue that a new fix in mainline kernel drastically reduces the transmit performance because software needs to calculate CRC for every frame sent. This causes the TCP throughput to drop below 90 Mbps while typical throughput is 150 Mbps. This can be workaround by not using the fix as that fix is not needed for normal operation. Now the numbers shown here are much lower so I do not know what is going on.

Bartel-C8 commented 2 years ago

We found out that iperf3 is very resource intensive (CPU/flash) and that this poses an additional bottleneck. We tried nuttcp and results were better.

Flash Acces is not that fast though, maybe that is a problem as well in later kernels? (But obviously SAMA5D3 related...)

triha2work commented 2 years ago

iperf, iperf3, and nuttcp give out the same result in my setup. Is SAMA5D3 or the KSZ9477 evaluation board being used in these tests?

sgidel commented 1 year ago

I also ran into this problem and have spent probably 70 hours trying to figure it out. I finally found the cause - TCP segmentation offload. I use the in-tree DSA driver and not the driver in this repo but the same should apply to it as well. What seems to be happening is that packets that make use of TSO get dropped while small standard packets make it through. This is due to the fact that the tail tag gets added to the end of the full TSO skb so when the network hardware transmits the normal frame sized chunks its going to be missing the tail tag for all but the last chunk. This also results in tons of checksum errors on the receiving device. It looks like this bug has been fixed in newer kernel versions by disabling scatter-gather. I am using 5.10 though. See https://elixir.bootlin.com/linux/v6.2-rc4/source/net/dsa/slave.c#L2352 My fix for 5.10 was to add slave->features &= ~NETIF_F_GSO_SOFTWARE; to slave_dsa_setup_tagger , however the implementation in 6.2 seems cleaner and probably works better for other offloads including RX as well.