OpenIntelWireless / itlwm

Intel Wi-Fi Drivers for macOS
https://OpenIntelWireless.github.io/itlwm/
GNU General Public License v2.0
6.74k stars 643 forks source link

WiFi disconnects on sustained upload or download #934

Closed mikebeaton closed 1 week ago

mikebeaton commented 8 months ago

Have You Read Our Docs Yes

Are You Reporting A Bug Yes

Environment

Description For general browsing WiFi works, but on sustained traffic up or down, WiFi fairly quickly hangs (actually disconnects) and transfer hangs. (Note the same network is fine on other machines, and the same WiFi card and network is fine on Linux on the same machine.)

Bug Report Archive Log_2023-11-09_07-54-24.log

Showing the disconnect which occurs on sustained throughput (will sometimes go to reconnecting to WiFi (as shown); will sometimes eventually go to exclamation mark permanent error signal on WiFi icon (not shown here)):

foo

Example of download hanging (seems to hang at the same point on multiple attempts?):

Screenshot 2023-11-09 at 07 54 00

Error message on cancel, may have useful info?:

Screenshot 2023-11-09 at 07 54 13

Kext Download Source Downloaded from alpha releases from this site.

zxystd commented 7 months ago

Have you tried V2.2.0? Or the Preview05 version https://github.com/OpenIntelWireless/itlwm/issues/883#issuecomment-1670749680 ?

mikebeaton commented 7 months ago

Thank you for the suggestions.

I have tested these and yes v2.2.0 works (tested on Ventura) and Preview05 works (tested on Sonoma), so it is just v2.3.0-DEBUG-alpha-ff1138b which shows the problem (on all tested macOS).

I have realised that the issue is easily replicable and seems to be related to something specific which git is doing.

That specific command git clone git@github.com:acidanthera/OcBuild.git (at least on the current state of that repo) pauses at 26%, 34% and 98%, even on macOS with a working or wired connection, and even on Linux. But it passes those pauses and finishes when working normally, however hangs and/or disconnects as described on the problem release. (I don't mean that just this repo has issues, rather that it is easy to show the issue, e.g. with this repo.)

The 'sustained upload' with which I had problems was actually a large git commit, too. (I don't know if whatever git is doing is related to the 'sideband' error in one of the images above?)

zxystd commented 7 months ago

Thank you for the report!

[   94.049483]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.095533]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.103896]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.282187]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.287920]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 
[   94.387648]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f016: VHT | ANT: AB BW: 80Mhz MCS: 6 NSS: 2 SGI LDPC 
[   94.563766]: itlwm: __iwl_mvm_rs_tx_status new rate: 0x4c00f017: VHT | ANT: AB BW: 80Mhz MCS: 7 NSS: 2 SGI LDPC 

[  102.998809]: itlwm: itlwm: Start UMAC Error Log Dump:
[  102.998811]: itlwm: itlwm: Status: 0x39, count: 7
[  102.998814]: itlwm: itlwm: 0x2010071F | ADVANCED_SYSASSERT

From the log, the driver firmware crash when doing large traffic, the most suspicious difference is 2637e9014b7af098974c8d6db69230f75fb20a90 which implement new rate scaling control algorithm(ported from linux iwlwifi driver), and enable STBC/LDPC/BEFER features, you can revert this commit and can confirm that. The crash happend after configured new rate, not during configuring new rate, and the error code 0x2010071F is only readable by Intel engineers, so I can‘t accurately locate the rate problem.

[  106.741138]: itlwm: LEGACY=FF0 SISO=7FD0 MIMO2=7FD0 VHT=1 LDPC=1 STBC=0 BFER=1
[  106.741141]: itlwm: MAX RATE: LEGACY=11 SISO=14 MIMO2=14

BFER=1 means Beamforming feature on the router side is ON, I haven't test if the driver is really satisified with that, can you try disable it? And can you try if the driver can work on 11n mode? currently it is on 11ac.

mikebeaton commented 7 months ago

Here's some initial new data, v2.3.0-DEBUG-alpha-ff1138b on Sonoma with 802.11n (obtained via -novht boot arg) gives the same problem. Here is a log:

Log_2023-11-13_17-54-11.log

I don't think I have control over beamforming in my router (it has a somewhat limited end-user interface ... the service is fast, at least ... ;-) ).

Also, I hope I haven't misunderstood, but in the pasted log in this stack exchange question https://askubuntu.com/q/1474713 there is 0x2010071F | ADVANCED_SYSASSERT following what seems to be a completely different error, in case this is relevant?

mikebeaton commented 7 months ago

To test beamforming, I used a different network. These all have BFER=0, the error still occurs.

Different WiFi (bridged through to the same original Wifi):

Log_2023-11-13_18-27-01.log

Phone hotspot (direct to mobile network, not to another WiFi):

1/2: Log_2023-11-13_18-44-55.log

2/2: Log_2023-11-13_18-49-13.log

Note that in 1/2 above, I observed the network disconnect and re-connect during the attempt to git clone, presumably indicating the some part of the error had already happened, but I guess I did not leave it running long enough before pressing CTRL+C then generating the log, and the log file does not contain 0x2010071F | ADVANCED_SYSASSERT (even though generated after seeing the WiFi disconnect then re-connect). (EDIT: Possibly this is a red herring, and the disconnect and reconnect is unrelated to the main issue here? Not sure. Log 2/2 definitely shows the issue.)

mikebeaton commented 7 months ago

Finally, I can confirm that reverting the merge commit 2637e90 as suggested does indeed resolve the issue (tested working on Monterey and Sonoma).

mikebeaton commented 7 months ago

If it adds anything, I wonder if there may be some kind of bug in how my home router handles ... something about the traffic which git is generating. From a simple internet speed test, my work network is very similar in speed to my new home network, but my work PC can clone git repos far faster. Of course, if this is correct and there is something faulty about the network traffic, it is still not crashing and causing disconnects except in this one version of your (very useful!) driver - so I guess this would still make it some kind of bug in your driver, being shown up by the (possibly?) faulty network traffic in this case. Let me know if I can run any more tests to help with this.

mikebeaton commented 7 months ago

One more thought, and I'll leave it in your hands, but I'm still struggling here to checkout submodules (very slow, git fairly often aborts the process) - I'm just wondering if maybe what is happening here is that there's a real error in the traffic, but that the new code in the itlwm driver is escalating it too far? (i.e. it's being treated like a really bad error which makes the Wi-Fi disconnect, but really it's only a 'normally' bad error)

LubosKolouch commented 5 months ago

Yes same bug here. Browsing somehow works, but as long as you start anything slightly network-intensive (rsync, scp, ...) the transfer hangs very shortly.

LubosKolouch commented 5 months ago

The problem persists with AirportItlwm-Sonoma-v2.3.0-DEBUG-alpha-3e1624d.zip

LubosKolouch commented 3 months ago

The same problem in 14.4

zxystd commented 3 months ago

The same problem in 14.4

What's your wifi card model? and please upload the running log after drop and reconnect.

LubosKolouch commented 3 months ago

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

LubosKolouch commented 3 months ago

…ok, I cannot really.

During the stalled transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 Request timeout for icmp_seq 5 Request timeout for icmp_seq 6 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8015.897 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7011.728 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=6009.137 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=5014.392 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=4009.650 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=3007.820 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=2004.144 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=1003.956 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=3624.621 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=2622.468 ms 64 bytes from 142.251.36.132: icmp_seq=10 ttl=115 time=1619.776 ms 64 bytes from 142.251.36.132: icmp_seq=11 ttl=115 time=626.808 ms 64 bytes from 142.251.36.132: icmp_seq=12 ttl=115 time=477.406 ms 64 bytes from 142.251.36.132: icmp_seq=13 ttl=115 time=385.568 ms 64 bytes from 142.251.36.132: icmp_seq=14 ttl=115 time=7346.008 ms 64 bytes from 142.251.36.132: icmp_seq=15 ttl=115 time=6341.893 ms 64 bytes from 142.251.36.132: icmp_seq=16 ttl=115 time=5339.970 ms 64 bytes from 142.251.36.132: icmp_seq=17 ttl=115 time=4337.356 ms 64 bytes from 142.251.36.132: icmp_seq=18 ttl=115 time=3334.400 ms 64 bytes from 142.251.36.132: icmp_seq=19 ttl=115 time=2330.019 ms 64 bytes from 142.251.36.132: icmp_seq=20 ttl=115 time=1326.930 ms 64 bytes from 142.251.36.132: icmp_seq=21 ttl=115 time=329.333 ms 64 bytes from 142.251.36.132: icmp_seq=22 ttl=115 time=1773.373 ms

On 9. 3. 2024, at 19:48, Lubos Kolouch @.***> wrote:

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

LubosKolouch commented 3 months ago

… when I kill the transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8.383 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7.790 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=10.877 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=8.629 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=41.523 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=9.450 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=8.463 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=7.175 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=7.982 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=8.705 ms

On 9. 3. 2024, at 20:10, kolcon @.***> wrote:

…ok, I cannot really.

During the stalled transfer:

$ ping www.google.com PING www.google.com (142.251.36.132): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 Request timeout for icmp_seq 3 Request timeout for icmp_seq 4 Request timeout for icmp_seq 5 Request timeout for icmp_seq 6 64 bytes from 142.251.36.132: icmp_seq=0 ttl=115 time=8015.897 ms 64 bytes from 142.251.36.132: icmp_seq=1 ttl=115 time=7011.728 ms 64 bytes from 142.251.36.132: icmp_seq=2 ttl=115 time=6009.137 ms 64 bytes from 142.251.36.132: icmp_seq=3 ttl=115 time=5014.392 ms 64 bytes from 142.251.36.132: icmp_seq=4 ttl=115 time=4009.650 ms 64 bytes from 142.251.36.132: icmp_seq=5 ttl=115 time=3007.820 ms 64 bytes from 142.251.36.132: icmp_seq=6 ttl=115 time=2004.144 ms 64 bytes from 142.251.36.132: icmp_seq=7 ttl=115 time=1003.956 ms 64 bytes from 142.251.36.132: icmp_seq=8 ttl=115 time=3624.621 ms 64 bytes from 142.251.36.132: icmp_seq=9 ttl=115 time=2622.468 ms 64 bytes from 142.251.36.132: icmp_seq=10 ttl=115 time=1619.776 ms 64 bytes from 142.251.36.132: icmp_seq=11 ttl=115 time=626.808 ms 64 bytes from 142.251.36.132: icmp_seq=12 ttl=115 time=477.406 ms 64 bytes from 142.251.36.132: icmp_seq=13 ttl=115 time=385.568 ms 64 bytes from 142.251.36.132: icmp_seq=14 ttl=115 time=7346.008 ms 64 bytes from 142.251.36.132: icmp_seq=15 ttl=115 time=6341.893 ms 64 bytes from 142.251.36.132: icmp_seq=16 ttl=115 time=5339.970 ms 64 bytes from 142.251.36.132: icmp_seq=17 ttl=115 time=4337.356 ms 64 bytes from 142.251.36.132: icmp_seq=18 ttl=115 time=3334.400 ms 64 bytes from 142.251.36.132: icmp_seq=19 ttl=115 time=2330.019 ms 64 bytes from 142.251.36.132: icmp_seq=20 ttl=115 time=1326.930 ms 64 bytes from 142.251.36.132: icmp_seq=21 ttl=115 time=329.333 ms 64 bytes from 142.251.36.132: icmp_seq=22 ttl=115 time=1773.373 ms

On 9. 3. 2024, at 19:48, Lubos Kolouch @.***> wrote:

Intel Corporation Wireless 8265 / 8275 (rev 78)

How to get the log?

Also, the Wifi connection seems to be connected and working all the time, just the impacted transfers stall.

When it is stalled, I can for example go to browser and browse pages no problem, even play a movie over internet.

Same situation with AirportItlwm and itlwm, AirportItlwm-Sonoma14.4- v2.3.0-DEBUG-alpha-42dc7bc.zip

On Sat, 2024-03-09 at 05:53 -0800, 钟先耀 wrote:

The same problem in 14.4 What's your wifi card model? and please upload the running log after drop and reconnect. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zxystd commented 3 months ago

How to get the log?

https://openintelwireless.github.io/itlwm/Troubleshooting.html#runtime-logs

collect the log after the issue happened. @LubosKolouch

LubosKolouch commented 3 months ago

Thanks, attached both for AirportItlwm and itlwm, same behavior.

On 10. 3. 2024, at 3:43, 钟先耀 @.***> wrote:

How to get the log?

https://openintelwireless.github.io/itlwm/Troubleshooting.html#runtime-logs

collect the log after the issue happened. @LubosKolouch https://github.com/LubosKolouch — Reply to this email directly, view it on GitHub https://github.com/OpenIntelWireless/itlwm/issues/934#issuecomment-1987048200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA7UARYL6J73VNMVDBEIJLLYXPCF7AVCNFSM6AAAAAA7ED4FWGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGA2DQMRQGA. You are receiving this because you were mentioned.

zxystd commented 3 months ago

@LubosKolouch I don't see the files...

LubosKolouch commented 3 months ago

bugreport_582_airportitlwm.zip bugreport_44413_itlwm.zip

LubosKolouch commented 3 months ago

looks like they did not go through as email attachments, reuploaded them...

LubosKolouch commented 3 months ago

Problem is still present in AirportItlwm-Sonoma14.4-v2.3.0-DEBUG-alpha-6dc344d.zip

zxystd commented 3 months ago

@LubosKolouch Recent commits doesn't do any help on this issue, because I don't locate reason of why the firmware crash randomly on yours case.

I have added more debug info on this kext, please trigger the disconnection again and then collect the log, thanks. Sonoma14.4.zip

LubosKolouch commented 3 months ago

attached... bugreport_38362.zip

zxystd commented 3 months ago

@LubosKolouch I don't see disconnection from the log, please collect the log after auto reconnect.

LubosKolouch commented 3 months ago

@zxystd is this log better? If not, I will collect more... bugreport_26665.zip

LubosKolouch commented 3 months ago

bugreport_62226.zip

zxystd commented 3 months ago

@LubosKolouch No disconnection or firmware crash there in the two logs, the driver is working fine all the 232 seconds. the only problem is that it is always doing roam scan to find a better WiFi, you are too far away from the router? scanning can affect the performance definitely.

If you reproduce the disconnect and reconnect issue, please collect the log and upload here.

LubosKolouch commented 3 months ago

@zxystd I hear what you say, I try with one more log. The pings are falling out, jumping to 4000ms and the transfers stops after transferring 255kB.

Some more comments:

1) other machines on Linux (next to this one) are working smoothly, no problem with wifi/transfers 2) if I reboot THIS machine to Linux, also wifi working fine 3) it was working fine with previous Airportilwm releases up to approximately November. For example with the Prelease05 I never had any issues.

bugreport_23996.zip

LubosKolouch commented 3 months ago
  1. When I use wired connection, the transfers work ok => it is not issue of my scp, rsync or some general network stack
zxystd commented 3 months ago
  1. it was working fine with previous Airportilwm releases up to approximately November. For example with the Prelease05 I never had any issues.

@LubosKolouch I believe what you said, so we should find out why it is stalled. here I add more log and reset the aggregation trigger condition to preview05, please check if it would help? Sonoma14.4.zip

LubosKolouch commented 3 months ago

@zxystd yes, this version works nicely and the transfer was very fast: scp unison.log ip:/home/lubos/aaa.log unison.log 100% 70MB 32.0MB/s 00:0

log attached bugreport_2985.zip

LubosKolouch commented 3 months ago

@zxystd I can continue testing another versions to bisect the issue if you want

zxystd commented 3 months ago

@LubosKolouch you mean the last version is better and would not stall if it keeps connected? I already see firmware crash and disconnect twice times in the log, will send you another version this night.

LubosKolouch commented 3 months ago

@zxystd Yes the last version works smoothly, the transfer does not stall and is very fast.

The only problem I have seen was at the beginning when it took a longer time to connect to the Wifi and it may be that it disconnected 2 times.

But since it connected, it seemed OK.

zxystd commented 3 months ago

Sonoma14.4.zip @LubosKolouch Could you please help me to try this one?

LubosKolouch commented 3 months ago

@zxystd of course, any time

Initial connection to wifi was slow and I think it crashed again 1-2 times, before it connected.

Afterwards the transfer was no problem, not stalling and fast: 100% 70MB 25.5MB/s 00:02

Debug log attached, awaiting next tests. bugreport_6298.zip

zxystd commented 3 months ago

@LubosKolouch No firmware crash in the log, please try more times to confirm? @mikebeaton Hi, can you also help to test this version?

LubosKolouch commented 3 months ago

@zxystd I tried switching between 2 wifi networks several times. It was not good, to one of them (BB) I was able to connect only once.

When it connected, transfer works ok, not stalled.

Log attached. bugreport_62920.zip

zxystd commented 3 months ago

@LubosKolouch BB and BB5 are very easy to connect with no firmware crash anymore, you can try to swicth between two different 5G band networks to verify that. seems like there is no traffic(you can see it is connected but no ip address) when you connect to the 2.4G one BB, don't know why.

LubosKolouch commented 3 months ago

@zxystd sounds good, thank you. As far as I am concerned, the issue can be closed.

Pls let me know when I can assist you in future testings.

zxystd commented 3 months ago

@LubosKolouch Thank you for the testing. I make another version that try to fix the 2.4G band stability, in your case it is hard to connect. Sonoma14.4.zip

LubosKolouch commented 3 months ago

@zxystd thank you, I'd say it behaves better now. I tried to reconnect a few times, each reconnect takes 10 seconds. Log attached. bugreport_56385.zip

LubosKolouch commented 3 months ago

@zxystd I had the stalling problem again and pings stopping for several seconds, but it recovered after another few seconds. Log attached in case you see something wrong there. bugreport_3296.zip

zxystd commented 3 months ago
[  472.403800]: itlwm: ieee80211_recv_disassoc
[  472.403815]: itlwm: Disassoc received, reason 34
[  472.403821]: itlwm: ieee80211_recv_disassoc 2949 nstate=3

The router disassociate initiatively at that time, reason code is 34 which means that the router wants to switch to another good channel since current channel is poor. I think it is not a problem if it is not occur every time. http://www.emcu.eu/wifi-802-11-association-disassociation-reason-code/ image

mikebeaton commented 3 months ago

GitHub pull and push issues appear to be resolved with the above version of the driver. It behaves similarly to an M1 Mac doing the same operations on the same network. Thank you. :-)