ValveSoftware / SteamOS

SteamOS community tracker
1.54k stars 69 forks source link

[Steam Deck LCD] Wireless issues still present on SteamOS 3.5 (Linux 6.1 LTS and later), seems Archer C7 OpenWRT related #1119

Open RodoMa92 opened 1 year ago

RodoMa92 commented 1 year ago

Your system information

Please describe your issue in as much detail as possible:

While using the device in desktop mode I have been able to get the extremely rare (for me) random disconnect from my wifi, yellow triangle. I've pulled debug registry from the rtw88 driver (before and after) and the kernel log (only after reconnecting).

Reading the kernel log seems to indicate that the wireless firmware still has issues exiting lpm even on main (and some coex error associated with it). Can you please get a fixed firmware from Realtek? I would like to keep having decent battery life keeping wireless powersave on (I still had issues with random disconnect even without it, I've tested this shortly after launch so it seems to not be the only issue at play here).

Steps for reproducing this issue:

  1. Update to SteamOS main
  2. Wait a random time and see the wireless connection failing with a yellow triangle

Hardware logs attached on this post phy_info_work.txt phy_info_broken.txt kernel_log.txt coex_info_work.txt coex_info_broken.txt

craftyguy commented 1 year ago

I would like to keep having decent battery life keeping wireless powersave on

Are you implying that you don't experience this if you disable powersave for wifi?

RodoMa92 commented 1 year ago

It's extremely rare unfortunately, so I can't really be sure. I still had connections issues even with it off a year ago, but I can't be sure if its the same issue or not. Switching to desktop mode reset the connection, so doing further testing from a terminal is not really suitable.

The only implication is that I would rather avoid to have to disable wireless powersave as a workaround regardless of the outcome.

RodoMa92 commented 1 year ago

I've also opened a bugzilla issue on the main kernel here and it seems that the Realtek engineers are trying to reproduce the issue. To me from the logs looks like a firmware lockup, especially with all that h2c communication failures.

if you want to check the status with them, feel free to contact them.

Marco.

jmeloranta commented 1 year ago

I am seeing bunch of these on 6.1.39: [44518.610833] firmware failed to ack driver for leaving Deep Power mode [44518.610883] WARNING: CPU: 4 PID: 92164 at drivers/net/wireless/realtek/rtw88/ps.c:107 rtw_power_mode_change+0xe6/0x120 [rtw88_core] [44518.610918] Modules linked in: hidp uhid tls uinput snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ccm algif_aead cbc des_generic libdes ecb md4 nf_tables ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nfnetlink ip6table_filter ip6_tables iptable_filter cmac algif_hash algif_skcipher af_alg bnep intel_rapl_msr intel_rapl_common edac_mce_amd snd_soc_acp5x_mach snd_acp5x_pcm_dma snd_acp5x_i2s snd_sof_amd_rembrandt kvm_amd snd_sof_amd_renoir snd_sof_amd_acp kvm snd_sof_pci amdgpu rtw88_8822ce irqbypass snd_sof_xtensa_dsp rtw88_8822c crct10dif_pclmul crc32_pclmul rtw88_pci snd_sof polyval_clmulni snd_hda_codec_hdmi polyval_generic btusb rtw88_core gf128mul snd_hda_intel snd_sof_utils joydev snd_soc_cs35l41_spi ghash_clmulni_intel snd_soc_cs35l41 [44518.610997] snd_intel_dspcfg btrtl snd_soc_wm_adsp mac80211 snd_intel_sdw_acpi btbcm snd_pci_ps cs_dsp snd_soc_nau8821 snd_rpl_pci_acp6x sha512_ssse3 amdgpu_xcp_drv btintel snd_soc_cs35l41_lib snd_acp_pci aesni_intel drm_buddy r8153_ecm gpu_sched atkbd btmtk crypto_simd snd_hda_codec libarc4 hid_multitouch leds_steamdeck extcon_steamdeck steamdeck_hwmon cdc_ether cryptd drm_ttm_helper snd_pci_acp6x libps2 snd_soc_core mousedev wdat_wdt snd_hda_core bluetooth rapl usbnet ttm pcspkr cfg80211 vivaldi_fmap snd_compress r8152 snd_pci_acp5x tpm_crb ac97_bus snd_hwdep snd_pcm_dmaengine snd_rn_pci_acp3x ecdh_generic mii drm_display_helper opt3001 snd_acp_config snd_pcm tpm_tis sp5100_tco snd_soc_acpi i2c_piix4 cdc_acm snd_pci_acp3x rfkill mmc_block ccp dwc3_pci snd_timer cec steamdeck video snd ltrf216a wmi i2c_hid_acpi spi_amd tpm_tis_core industrialio 8250_dw soundcore i2c_hid acpi_cpufreq hid_steam mac_hid pkcs8_key_parser crypto_user fuse dm_mod loop bpf_preload tpm ip_tables x_tables [44518.611101] overlay ext4 crc16 mbcache jbd2 usbhid vfat fat btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq sdhci_pci serio_raw cqhci nvme sdhci crc32c_intel nvme_core i8042 xhci_pci mmc_core nvme_common xhci_pci_renesas serio [44518.611134] CPU: 4 PID: 92164 Comm: kworker/u32:1 Tainted: G W 6.1.39-valve1-1-neptune-61 #1 3431f60c98a3551f94b140e039c4da19b7d1eff6 [44518.611140] Hardware name: Valve Jupiter/Jupiter, BIOS F7A0116 05/12/2023 [44518.611143] Workqueue: phy0 rtw_watch_dog_work [rtw88_core] [44518.611166] RIP: 0010:rtw_power_mode_change+0xe6/0x120 [rtw88_core] [44518.611190] Code: 60 4b 52 e7 44 30 e0 78 23 45 84 ed 48 c7 c0 3d a8 0e c1 48 c7 c6 46 a8 0e c1 48 c7 c7 08 8a 0e c1 48 0f 45 f0 e8 ca a7 5a e6 <0f> 0b 5b 5d 41 5c 41 5d e9 8d 4d 52 e7 48 8b 8b b0 4b 00 00 8b 83 [44518.611192] RSP: 0018:ffffa9f748f9fdf0 EFLAGS: 00010286 [44518.611196] RAX: 0000000000000000 RBX: ffff928e18fb2080 RCX: 0000000000000027 [44518.611198] RDX: ffff92912ed20728 RSI: 0000000000000001 RDI: ffff92912ed20720 [44518.611200] RBP: 0000287d49691755 R08: 0000000000000000 R09: ffffa9f748f9fc60 [44518.611202] R10: 0000000000000003 R11: ffff92913ef7ffe8 R12: 00000000c10a3080 [44518.611204] R13: 0000000000000000 R14: 0000000000000000 R15: ffff928e18fb6a20 [44518.611206] FS: 0000000000000000(0000) GS:ffff92912ed00000(0000) knlGS:0000000000000000 [44518.611209] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [44518.611211] CR2: 0000248300a4b000 CR3: 0000000413c10000 CR4: 0000000000350ee0 [44518.611213] Call Trace: [44518.611217] [44518.611219] ? rtw_power_mode_change+0xe6/0x120 [rtw88_core 66a4930be2bfb813b754a10af9beeab405920fe8] [44518.611242] ? __warn+0x7d/0xd0 [44518.611250] ? rtw_power_mode_change+0xe6/0x120 [rtw88_core 66a4930be2bfb813b754a10af9beeab405920fe8] [44518.611274] ? report_bug+0xe6/0x150 [44518.611281] ? handle_bug+0x3a/0x70 [44518.611286] ? exc_invalid_op+0x17/0x70 [44518.611290] ? asm_exc_invalid_op+0x1a/0x20 [44518.611299] ? rtw_power_mode_change+0xe6/0x120 [rtw88_core 66a4930be2bfb813b754a10af9beeab405920fe8] [44518.611323] rtw_pci_deep_ps+0xaa/0xd0 [rtw88_pci b9eeb840c23894dd4539810b35222683bed33abe] [44518.611334] rtw_leave_lps+0x1d/0x1a0 [rtw88_core 66a4930be2bfb813b754a10af9beeab405920fe8] [44518.611357] rtw_watch_dog_work+0x1d4/0x250 [rtw88_core 66a4930be2bfb813b754a10af9beeab405920fe8] [44518.611381] process_one_work+0x1c7/0x3a0 [44518.611389] worker_thread+0x51/0x390 [44518.611394] ? process_one_work+0x3a0/0x3a0 [44518.611398] kthread+0xde/0x110 [44518.611402] ? kthread_complete_and_exit+0x20/0x20 [44518.611407] ret_from_fork+0x22/0x30 [44518.611416] [44518.611417] ---[ end trace 0000000000000000 ]--- [44518.615144] rtw_8822ce 0000:03:00.0: failed to send h2c command [44518.718399] rtw_8822ce 0000:03:00.0: firmware failed to leave lps state [44518.721505] rtw_8822ce 0000:03:00.0: failed to send h2c command [44518.724609] rtw_8822ce 0000:03:00.0: failed to send h2c command [44518.727702] rtw_8822ce 0000:03:00.0: failed to send h2c command

RodoMa92 commented 1 year ago

Yeah, that actually looks like something power management related and a bit different than mine. Still, it seems another firmware bug. It's probably more useful to post this on the bugzilla kernel tracker and add the rtw88 driver mantainer to the cc for the opened bug to get a more direct response.

RodoMa92 commented 11 months ago

Unfortunately they couldn't reproduce the issue, and now since I've switched to a different distribution that uses Linux 6.4 I don't get any useful kernel logs, besides having the same random hanging rarely. If anyone can help them debug it on the stock SteamOS kernel, be my guest.

As an upside, newer kernels seems to have reduced the times it happens for me, but since this issues is very likely broken in firmware and there is no documentation on architecture or anything, I really can't debug it further on my system.

I hope that Valve will consider to use an hardware manufacturer with cares a little bit more regarding Linux compatibility for their next hardware, or just change the module in a subsequent revision of the Deck. Always had decent experiences with Intel Wireless hardware and the few issues has always been fixed in a decent amount of time, but that might be just a low sample size for software issues.

RodoMa92 commented 11 months ago

If anyone at Valve know any additional debugging steps, LMK. They posted a patch to compile in the kernel to get additional debugging from the lockup on SteamOS kernel but compiling kernels on SteamOS is a lot of pain considering the locked nature of the underlying operating system. Besides, I really can't test it again since SteamOS allow to be reinstalled only on internal storage, and I'm not using it anymore for unrelated issues.

Would love to know if these wireless issues are still on the tracking radar for Valve and what attempts are trying still as today to fix these issues, but after a year I'm not holding my breath.

One issue that I have is no way to know an internal status of which bugs are tracked and which are being worked on. Being able to report them here is often not enough to know if someone is working on them and when to try to retest specific issues, especially if they are hard to fix like the aforementioned wireless issues.

RodoMa92 commented 8 months ago

I'm updating this report since even with the latest and greatest Linux 6.6 I still have wireless stalls from time to time. Sadly when these happens I can't pull any relevant logs, since it seems that there are no errors to be mentioned. Wireless adapter still reacts like it's connected (I can see max bandwidth changing) but I do not have any traffic in or out of the device.

I had to leave SteamOS since when you switched 6.1 LTS on the main channel the adapter became basically unusable on my device, I had like 4 stalls a day, which while playing online makes the device unusable basically.

I have a mesh network composed of two access points composed of:

They are connected by ethernet toghether, they just act as AP independently toghether with the same network name and the same password.

I'll point other people on this report to collect additional details like these in order to help you get to the bottom of this. Realtek has kinda helped on unrelated issues, but the core part is still present even on my current OS, it just happens far more rarely (but still kinda infuriating nontheless).

There was additional details and back and forth from my specific issue with Realtek here, in case you want to get more details on it.

cryptoluks commented 8 months ago

Hey @RodoMa92,

I have similar issues as you with the Deck with a similar AP setup of multiple Archer C7 (v5 and older ones) with the latest OpenWrt.

It looks like you have put some effort in trying to pinpoint the issue, awesome. I will try to pull some debug logs from my unit when Wi-Fi crashes again.