brektrou / rtl8821CU

Realtek RTL8811CU/RTL8821CU USB Wi-Fi adapter driver for Linux
GNU General Public License v2.0
1.59k stars 460 forks source link

Repeated site-survey errors causing latency/interruption issues #63

Closed offsides closed 4 years ago

offsides commented 4 years ago

I am seeing issues with the driver continuously generating the same error in dmesg, and coincidentally also seeing a lot of latency/slowdown/interruption problems with networking, most obviously in SSH sessions that just freeze for 5-10 seonds before responding again. Specifically, I start seeing the error not long after I first connect and then it comes sporadically but always in pairs 2 seconds apart. Here's what I'm seeing (initial connection lines followed by the repeated error):

[  262.947364] IPv6: ADDRCONF(NETDEV_CHANGE): wlp0s20f0u6: link becomes ready
[  262.948239] RTW: set group key camid:5, addr:48:5d:36:d4:c1:be, kid:2, type:AES
[  274.565772] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  276.621743] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  288.902584] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  290.954432] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  303.237807] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  305.290415] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  331.913676] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  333.968971] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  346.254700] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[  348.302388] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey

I am using Fedora 31 (soon to be F32) and NetworkManager under KDE. Any ideas/suggestions on how to fix this would be greatly appreciated.

offsides commented 4 years ago

Further info: I turned the log level up from 3 to 4, and I get this:

[ 5200.403842] RTW: rtw_wx_set_scan(wlp0s20f0u6)
[ 5200.404311] RTW: [HW_VAR_CHECK_TXBUF] Empty in 0 ms
[ 5200.404391] RTW: wlp0s20f0u6 sleep m0=0x00000003, ori reg_0x4d4=0x00000000
[ 5201.545155] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[ 5203.599415] RTW: OnAction_back
[ 5203.599428] RTW: OnAction_back, action=2
[ 5203.599432] RTW: OnAction_back(): DELBA: 0(0)
[ 5203.602636] RTW: ERROR [RFK-CHK] RF-K not allowed due to ifaces under site-survey
[ 5204.349251] RTW: wlp0s20f0u6 wakeup m0=0x00000003, ori reg_0x4d4=0x00000003
[ 5204.349699] RTW: survey done event(b) band:0 for wlp0s20f0u6
[ 5204.349725] RTW: rtw_indicate_scan_done(wlp0s20f0u6)
[ 5204.351276] RTW: OnAction_back
[ 5204.351277] RTW: OnAction_back, action=0
[ 5204.351675] RTW: issue_addba_rsp_wait_ack(wlp0s20f0u6) ra=48:5d:36:d4:c1:be status:=0 tid=0 size:64, acked, 1/3 in 1 ms

Additionally, I was running a ping to another machine on my network (wired to the router) and saw this massive increase in ping times when the above hit:

64 bytes from 10.52.73.10: icmp_seq=17 ttl=64 time=5.87 ms
64 bytes from 10.52.73.10: icmp_seq=18 ttl=64 time=1.47 ms
64 bytes from 10.52.73.10: icmp_seq=19 ttl=64 time=2.06 ms
64 bytes from 10.52.73.10: icmp_seq=20 ttl=64 time=1.55 ms
64 bytes from 10.52.73.10: icmp_seq=21 ttl=64 time=2.55 ms
64 bytes from 10.52.73.10: icmp_seq=22 ttl=64 time=3774 ms
64 bytes from 10.52.73.10: icmp_seq=23 ttl=64 time=2743 ms
64 bytes from 10.52.73.10: icmp_seq=24 ttl=64 time=1719 ms
64 bytes from 10.52.73.10: icmp_seq=25 ttl=64 time=695 ms
64 bytes from 10.52.73.10: icmp_seq=26 ttl=64 time=1.35 ms
64 bytes from 10.52.73.10: icmp_seq=27 ttl=64 time=3.18 ms

So it's definitely related. Is there some sort of periodic scan that its doing that it shouldn't be? And if so, is that something coming from inside the driver, or is it an externally triggered event? If the latter, then how do you trigger it, so I can try to figure out what's doing that...

offsides commented 4 years ago

OK, I've narrowed it down to the periodic scan event sent by NetworkManager. The question becomes, why does a scan event that runs a survey a) cause the network to essentially drop for ~4 seconds and b) generate the indicated errors? I have a temporary workaround (kill -STOP ), but that's not what should be needed to make it work properly. Any and all assistance would be appreciated.

offsides commented 4 years ago

And now it's not a problem anymore. I finally figured out how to disable the internal wifi NIC, and now that it's gone NetworkManager isn't trying to constantly scan. The moral of the story is, don't have multiple wireless NICs on a laptop...