aparcar / openwrt

Staging tree of Paul Spooren
Other
9 stars 1 forks source link

FS#791 - Broken STA causing 2.4G wireless failure on ath10k #788

Closed aparcar closed 7 years ago

aparcar commented 7 years ago

imac:

Device: Archer C7 AC1750 Version: 17.01.1

We have a Samsung Multifunction printer connected to 2.4G. For some reason, it sometimes goes to sleep in way that it can't be woke from with out a power reset. While this is happening, we see the following activity in our LEDE logs.

Wed May 17 13:02:21 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: associated (aid 3) Wed May 17 13:02:25 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: authenticated Wed May 17 13:02:25 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: associated (aid 3) Wed May 17 13:02:29 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: authenticated Wed May 17 13:02:29 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: associated (aid 3) Wed May 17 13:02:34 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: authenticated Wed May 17 13:02:34 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: associated (aid 3) Wed May 17 13:02:38 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: authenticated Wed May 17 13:02:38 2017 daemon.info hostapd: wlan1: STA 30:cd:a7:a2:a7:81 IEEE 802.11: associated (aid 3)

Although no errors appear in the logs, eventually the 2.4G wifi stops working (all 2.4 clients disconnected). The next morning we execute a quick 'wifi' and the system is restored.

We have a number of C7s running LEDE with no issues like this, but also without broken clients like this printer.

We upgraded the printer to hopefully make this problem go away (fingers crossed) but seemed like a worthwhile observation to log here in the event that we might help improve the robustness of the AP.

[ 11.696716] PCI: Enabling device 0000:01:00.0 (0000 -> 0002) [ 11.702619] ath10k_pci 0000:01:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0 [ 11.924065] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/pre-cal-pci-0000:01:00.0.bin failed with error -2 [ 11.934954] ath10k_pci 0000:01:00.0: Falling back to user helper [ 12.069249] firmware ath10k!pre-cal-pci-0000:01:00.0.bin: firmware_loading_store: map pages failed [ 12.358525] ath10k_pci 0000:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub 0000:0000 [ 12.367925] ath10k_pci 0000:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1 [ 12.380955] ath10k_pci 0000:01:00.0: firmware ver 10.2.4-1.0-00016 api 5 features no-p2p,raw-mode,mfp crc32 0c5668f8 [ 12.391740] ath10k_pci 0000:01:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed with error -2 [ 12.402341] ath10k_pci 0000:01:00.0: Falling back to user helper [ 12.485625] firmware ath10k!QCA988X!hw2.0!board-2.bin: firmware_loading_store: map pages failed [ 12.507338] ath10k_pci 0000:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08 [ 13.675416] ath10k_pci 0000:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 1 [ 13.786590] ath: EEPROM regdomain: 0x0 [ 13.786606] ath: EEPROM indicates default country code should be used [ 13.786614] ath: doing EEPROM country->regdmn map search [ 13.786631] ath: country maps to regdmn code: 0x3a [ 13.786640] ath: Country alpha2 being used: US [ 13.786648] ath: Regpair used: 0x3a

config wifi-device 'radio1' option type 'mac80211' option hwmode '11g' option path 'platform/qca955x_wmac' option channel '11' option htmode 'HT40-' option txpower '24' option ht_coex '0' option noscan '1' option country 'US'

config wifi-iface option device 'radio1' option network 'lan' option mode 'ap' option ssid 'xxx' option key 'xxx' option encryption 'psk2+ccmp'

Client was a M2885FW printer running firmware V3.00.01.06 upgraded to V3.00.01.16.

aparcar commented 7 years ago

imac:

We had no issues with our ath10k/archerC7 LEDE 17.01.1 2.4G WiFi since reconfiguration of the printer, until yesterday. It looks like the printer wifi issues were not the root cause of the dropping 2.4G. Our uptime confirms we restarted on the May 17th when we reported this issue initially. It seems to have happened again, after a much longer time period, and with nothing in the logs this time that would indicate a problem.

It would appear removing all the Printer WiFi activity slowed down the re-occurrence of this issue.

Attached are the links to the logs and dmesg on the device. Below is the uptime information that can be used to confirm timeticks to correlate dmesg with syslog.

Syslog (Splunk Reformatted): https://drive.google.com/file/d/0B0LMaCILQ19jVjVPbWFtbWRQR0E/view?usp=sharing

Dmesg https://drive.google.com/file/d/0B0LMaCILQ19jYS1XX0FGcktjNjg/view?usp=sharing

root@ArcherC7:~# cut -d' ' -f1 </proc/uptime 2938073.43 root@ArcherC7:~# date Tue Jun 20 10:19:37 EDT 2017 root@ArcherC7:~# uptime 10:19:41 up 34 days, 8 min, load average: 0.11, 0.07, 0.01 root@ArcherC7:~#

In dmesg on May 27 and June 10 we see this, with no direct correlation to the 2.4G WiFi dieing yesterday. [2025533.559387] ath: phy1: Unable to reset channel, reset status -5 [2107857.820477] ath: phy1: Unable to reset channel, reset status -5

In syslog we simply see the 2.4 clients disassociate, all at the same time.

6/19/17 4:42:28.000 PM Jun 19 16:42:28 104.244.196.194 Jun 19 16:42:28 hostapd: wlan1: STA 70:62:b8:93:98:b8 IEEE 802.11: deauthenticated due to local deauth request 6/19/17 4:42:28.000 PM Jun 19 16:42:28 104.244.196.194 Jun 19 16:42:28 hostapd: wlan1: STA c0:bd:d1:3a:cb:1b IEEE 802.11: deauthenticated due to local deauth request 6/19/17 4:42:28.000 PM Jun 19 16:42:28 104.244.196.194 Jun 19 16:42:28 hostapd: wlan1: STA 8c:70:5a:db:8e:4c IEEE 802.11: deauthenticated due to local deauth request 6/19/17 4:42:28.000 PM Jun 19 16:42:28 104.244.196.194 Jun 19 16:42:28 hostapd: wlan1: STA 00:9a:cd:28:6d:bd IEEE 802.11: deauthenticated due to local deauth request

All 10 minutes after their last WPA: group key handshake.

Since the SSID can not be seen any longer, it looks like it just turned off.

The full Log files and dmesg are posted above for closer inspection.

The iwinfo is below (SSID's changed) wlan0 ESSID: "ArcherC7-5G" Access Point: 60:E3:27:2F:16:30 Mode: Master Channel: 149 (5.745 GHz) Tx-Power: 30 dBm Link Quality: 56/70 Signal: -54 dBm Noise: -108 dBm Bit Rate: 6.0 MBit/s Encryption: WPA2 PSK (CCMP) Type: nl80211 HW Mode(s): 802.11nac Hardware: 168C:003C 0000:0000 [Qualcomm Atheros QCA9880] TX power offset: none Frequency offset: none Supports VAPs: yes PHY name: phy0

wlan1 ESSID: "ArcherC7" Access Point: 60:E3:27:2F:16:31 Mode: Master Channel: 11 (2.462 GHz) Tx-Power: 24 dBm Link Quality: unknown/70 Signal: unknown Noise: -91 dBm Bit Rate: unknown Encryption: WPA2 PSK (CCMP) Type: nl80211 HW Mode(s): 802.11bgn Hardware: unknown [Generic MAC80211] TX power offset: unknown Frequency offset: unknown Supports VAPs: yes PHY name: phy1

Our wifi-device config is below, which forces the 40Mhz wide band:

config wifi-device 'radio1' option type 'mac80211' option hwmode '11g' option path 'platform/qca955x_wmac' option channel '11' option htmode 'HT40-' option txpower '24' option ht_coex '0' option noscan '1' option country 'US'

Everything is uptodate package-wise, so we are not sure how to debug this further.

root@ArcherC7:~# opkg update Downloading http://downloads.lede-project.org/releases/17.01.1/targets/ar71xx/generic/packages/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_core Downloading http://downloads.lede-project.org/releases/17.01.1/targets/ar71xx/generic/packages/Packages.sig Signature check passed. Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/base/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_base Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/base/Packages.sig Signature check passed. Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/luci/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_luci Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/luci/Packages.sig Signature check passed. Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/packages/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_packages Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/packages/Packages.sig Signature check passed. Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/routing/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_routing Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/routing/Packages.sig Signature check passed. Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/telephony/Packages.gz Updated list of available packages in /var/opkg-lists/reboot_telephony Downloading http://downloads.lede-project.org/releases/17.01.1/packages/mips_24kc/telephony/Packages.sig Signature check passed. root@ArcherC7:~# opkg list-upgradable luci-lib-ip - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-theme-bootstrap - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-app-firewall - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 dropbear - 2017.75-1 - 2017.75-2 luci-proto-ppp - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-mod-admin-full - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-base - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-proto-ipv6 - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-lib-nixio - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci-lib-jsonc - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1 luci - git-17.161.67166-8eadde5-1 - git-17.170.53336-8955523-1

aparcar commented 7 years ago

None:

The 2.4GHz band of archer c7 uses ath9k, not ath10k. And the log of ath10k_pci does not seem to be a problem. As far as I know it is a normal log.