Mieze / IntelLucy

MacOS open source driver for the Intel X500 family
19 stars 1 forks source link

X520-DA1 timeout issues with v1.0.4 #4

Closed RF3 closed 2 weeks ago

RF3 commented 1 month ago

I'm running a macOS Ventura 13.6.8 system with OC 1.0.0 and a X520-DA1 (device id 0x10fb, subsystem ID 0x000a) which is connected to a ZyXEL XGS1210-12 switch with a short passive Direct-Attach-Cable. With IntelLucy v1.0.4, I often have timeout issues lasting several seconds during large file transfers to the local Linux server (Debian, 2.5G USB Ethernet Adapter with Realtek RTL8156B chipset), but it never happens when I'm copying files from the server. I tried jumbo frames with a size of 9000 bytes as well as the standard 1500 bytes, but that didn't change anything. Now I'm using Apple's own IXGBE driver for over an week and this issue is gone.

Mieze commented 1 month ago

Are you experiencing connection drops (link down event) when the transmission stalls?

What kind of hardware are you using? Are you using flow control? Usually I recommend disabling it because it might cause performance issues and transmitter stalls but if you are transmitting large amounts of data to a station which is significantly slower, you might flood the other endpoint with packets causing DoS like effects.

Also keep in mind that jumbo frames have to be configured properly (same MTU on both endpoints). Otherwise you'll get performance issues and connection failures.

RF3 commented 1 month ago

Thank you for your fast response. You're doing a great job, BTW.

The target system is a Mini PC with an additional USB Ethernet Adapter for 2.5G. Flow control was disabled, as recommended. Most of the time, the timeouts happened after less than 100 Mbyte of data, so in an very early stage of the transfer. I tried jumbo frame settings on both sides as well as the standard packet size.

I just noticed that the Apple driver uses full duplex and nothing else. I haven't thought about the target getting flooded with packets. Makes sense. I will give your driver another try with flow control enabled and see what happens.

For now, I don't have a server with 10G support, therefore stable 2.5G transfers are all that I need. When I ran some iperf3 benchmarks with another computer that also supported 10G, full 10G speed (net data throughput 9.3G with jumbo frames) didn't result in timeouts with your driver.

RF3 commented 1 month ago

Unfortunately, enabling flow control didn't fix the issue. Large file transfers with Finder still terminate with an -51 error. The first two files (3.71 GB and 4.76 GB) after the restart with IntelLucy active were copied without timeout, but the following 8 tries with the same file always failed. It was a bit strange to see that with each try, the file transfer stopped at a later point. The first failed transfer stopped at 1.2 GB, the next one at 1.6 GB, then 2.5 GB and so on.

One thing that I noticed is that the timeouts happened at a later state with at least one GB successfully being copied. But I also installed Ventura 13.6.9, so I cannot say if that's causing the difference to previous tests when the transfer stopped at an very early stage, sometimes after ~60 MB. BTW, copying a large file with

dd if=<source file> bs=1M | pv > <target file>

in a shell also comes to a halt of several seconds (something between 10 and 30), but then it continues whereas Finder always terminates the transfer.

I think I also found a bug. When I change the network settings to "Full duplex, flow control" and confirm with OK, the settings still show "Full duplex" without flow control. But when I check with "ifconfig en0", it correctly states "10Gbase-Twinax \<full-duplex,flow-control>" and when I remove flow control, it changes to "10Gbase-Twinax \<full-duplex>".

Mieze commented 1 month ago

I ran some tests copying a 13GB file to a 2011 Mac mini in my network and couldn't reproduce the issue. Here is a brief description of the test configuration:

Client (macOS 14.5 / 10G with IntelLucy MTU 9000) - Mikrotik CRS309 (10G switch) - Netgear R7800 Router (1G switch) - Mac mini (1G MTU 1500) with 500GB SSD as the target medium for the copy operation.

The copy operation finished without failure or stall. That's why I can rule out a bug of IntelLucy. More likely the reason can be found on your server. Possible reasons for this behaviour are:

Oh, one more thing. AppleEthernetIXGBE most likely doesn't exhibit this problem because the transmit speed is much lower: https://www.insanelymac.com/forum/topic/359009-intellucy-for-the-intel-x500-family/?do=findComment&comment=2818606

The Network preferences panel is buggy and doesn't always show connection parameters properly, or at least fails to update this information in time.

I'd recommend to start checking what's happening on the server using 'top' while the transfer is in progress. Also examining the log files could contribute to find the reason and check the SAMBA configuration. The raw network performance and stability can be tested most effectively with 'iperf3' as shown in the speed test in the IM thread.