NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.08k stars 14.06k forks source link

iwifi error causing `systemd-journald` to log heavily on NixOS #92737

Closed ShamrockLee closed 4 years ago

ShamrockLee commented 4 years ago

Describe the bug The systemd-journald on my usb-stick-based NixOS (Nightingale) suddenly began to log heavily when I use it. The internet connection and the sudo command then becomes unavailable, and shutdown cannot be completed regularly.

Message in the journal: https://gist.github.com/ShamrockLee/afc8bf7b29fc7b7d681f8d16c4ca2cd9#file-journal_b_-1_20200709005330-txt-L3200-L3253

Extra info:

To Reproduce Steps to reproduce the behavior:

  1. Update to the latest version of nixos-unstable
  2. Connect to the wireless network and surf the net with your browser for 10 to 30 minutes.

Expected behavior

  1. The fan starts screaming.
  2. 'systemd-journald' used up 100% resource of one CPU core.
  3. Network connection becomes unavailable.
  4. sudo has no respond.
  5. Shutdown cannot be finished.

Screenshots Screenshot_20200708_150727

Additional context

Jul 09 00:45:17 nixos-usb-202005 kernel: ------------[ cut here ]------------
Jul 09 00:45:17 nixos-usb-202005 kernel: WARNING: CPU: 1 PID: 5140 at drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c:1379 iwl_mvm_abort_channel_switch+0xde/0xf0 [iwlmvm]
Jul 09 00:45:17 nixos-usb-202005 kernel: Modules linked in: ctr ccm fuse rfcomm af_packet cmac algif_hash algif_skcipher af_alg bnep 8021q nls_iso8859_1 nls_cp437 vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc uas btusb btrtl btbcm btintel bluetooth ecdh_generic ecc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio joydev mousedev hid_multitouch nouveau hid_generic iwlmvm snd_soc_skl intel_rapl_msr i915 intel_rapl_common i2c_designware_platform i2c_designware_core snd_soc_sst_ipc mac80211 snd_soc_sst_dsp mei_hdcp snd_hda_ext_core iTCO_wdt x86_pkg_temp_thermal snd_soc_acpi_intel_match watchdog snd_soc_acpi libarc4 intel_powerclamp mxm_wmi coretemp acer_wmi snd_soc_core intel_wmi_thunderbolt sparse_keymap wmi_bmof ttm cec snd_compress drm_kms_helper crct10dif_pclmul ac97_bus crc32_pclmul iwlwifi snd_pcm_dmaengine ghash_clmulni_intel drm snd_hda_intel intel_gtt deflate aesni_intel crypto_simd snd_intel_nhlt agpgart efi_pstore cryptd r8169
Jul 09 00:45:17 nixos-usb-202005 kernel:  snd_hda_codec cfg80211 i2c_algo_bit glue_helper evdev intel_lpss_pci fb_sys_fops intel_cstate syscopyarea pstore intel_lpss realtek snd_hda_core intel_uncore mei_me sysfillrect input_leds idma64 rfkill sysimgblt led_class rtsx_pci_ms virt_dma mei libphy intel_rapl_perf memstick intel_pch_thermal intel_xhci_usb_role_switch snd_hwdep roles i2c_hid mac_hid tpm_crb i2c_i801 efivars serio_raw hid tpm_tis i2c_core tpm_tis_core video thermal tpm backlight wmi intel_pmc_core rng_core acpi_pad ac button battery ip6table_nat iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore msr loop tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 nf_defrag_ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2
Jul 09 00:45:17 nixos-usb-202005 kernel:  sr_mod cdrom sd_mod usb_storage ahci xhci_pci libahci xhci_hcd libata rtsx_pci_sdmmc mmc_core usbcore atkbd scsi_mod libps2 rtsx_pci crc32c_intel usb_common i8042 rtc_cmos serio dm_mod
Jul 09 00:45:17 nixos-usb-202005 kernel: CPU: 1 PID: 5140 Comm: kworker/1:4 Tainted: G        W         5.4.49 #1-NixOS
Jul 09 00:45:17 nixos-usb-202005 kernel: Hardware name: Acer Aspire E5-576G/Ironman_SK, BIOS V1.32 10/24/2017
Jul 09 00:45:17 nixos-usb-202005 kernel: Workqueue: events iwl_mvm_channel_switch_disconnect_wk [iwlmvm]
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: Firmware not running - cannot dump error
Jul 09 00:45:17 nixos-usb-202005 kernel: RIP: 0010:iwl_mvm_abort_channel_switch+0xde/0xf0 [iwlmvm]
Jul 09 00:45:17 nixos-usb-202005 kernel: Code: ef e8 16 fe ff ff 85 c0 75 20 48 8b 44 24 10 65 48 33 04 25 28 00 00 00 75 14 48 83 c4 18 5d 41 5c 41 5d 41 5e c3 0f 0b eb c9 <0f> 0b eb dc e8 99 f1 65 cb 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
Jul 09 00:45:17 nixos-usb-202005 kernel: RSP: 0018:ffffb58d00a77e30 EFLAGS: 00010282
Jul 09 00:45:17 nixos-usb-202005 kernel: RAX: 00000000fffffffb RBX: ffff96929cd777f8 RCX: ffff96928f7d2328
Jul 09 00:45:17 nixos-usb-202005 kernel: RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
Jul 09 00:45:17 nixos-usb-202005 kernel: RBP: ffff969292999e58 R08: ffffb58d00a77db4 R09: ffff969292999e28
Jul 09 00:45:17 nixos-usb-202005 kernel: R10: 000000000000002c R11: ffffb58d00a77a05 R12: ffff96929cd773f0
Jul 09 00:45:17 nixos-usb-202005 kernel: R13: ffff9692929987a0 R14: ffff969292999e28 R15: 0ffff9692a6a6d90
Jul 09 00:45:17 nixos-usb-202005 kernel: FS:  0000000000000000(0000) GS:ffff9692a6a40000(0000) knlGS:0000000000000000
Jul 09 00:45:17 nixos-usb-202005 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 09 00:45:17 nixos-usb-202005 kernel: CR2: 00007f9431faa000 CR3: 000000018700a001 CR4: 00000000003606e0
Jul 09 00:45:17 nixos-usb-202005 kernel: Call Trace:
Jul 09 00:45:17 nixos-usb-202005 kernel:  iwl_mvm_channel_switch_disconnect_wk+0x20/0x30 [iwlmvm]
Jul 09 00:45:17 nixos-usb-202005 kernel:  process_one_work+0x1ea/0x3a0
Jul 09 00:45:17 nixos-usb-202005 kernel:  worker_thread+0x4d/0x3f0
Jul 09 00:45:17 nixos-usb-202005 kernel:  kthread+0xfb/0x130
Jul 09 00:45:17 nixos-usb-202005 kernel:  ? process_one_work+0x3a0/0x3a0
Jul 09 00:45:17 nixos-usb-202005 kernel:  ? kthread_park+0x90/0x90
Jul 09 00:45:17 nixos-usb-202005 kernel:  ret_from_fork+0x35/0x40
Jul 09 00:45:17 nixos-usb-202005 kernel: ---[ end trace 6835f771a7ce5bdd ]---
Jul 09 00:45:17 nixos-usb-202005 kernel: wlp3s0: driver channel switch failed, disconnecting
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: Microcode SW error detected.  Restarting 0x2000000.
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: Status: 0x00000050, count: 6
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 29.163394017.0
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00003410 | ADVANCED_SYSASSERT          
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x000002F0 | trm_hw_status0
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00043D6C | branchlink2
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x0004AFA2 | interruptlink1
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | interruptlink2
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | data1
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | data2
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0xDEADBEEF | data3
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | beacon time
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00001AD2 | tsf low
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | tsf hi
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp1
Jul 09 00:45:17 nixos-usb-202005 kernel: iwlwifi 0000:03:00.0: 0x00001AD3 | time gp2
Jul 09 00:45:17 nixos-usb-202005 systemd-journald[506]: Missed 248 kernel messages
Jul 09 00:45:17 nixos-usb-202005 kernel:  snd_hda_codec cfg80211 i2c_algo_bit glue_helper evdev intel_lpss_pci fb_sys_fops intel_cstate syscopyarea pstore intel_lpss realtek snd_hda_core intel_uncore mei_me sysfillrect input_leds idma64 rfkill sysimgblt led_class rtsx_pci_ms virt_dma mei libphy intel_rapl_perf memstick intel_pch_thermal intel_xhci_usb_role_switch snd_hwdep roles i2c_hid mac_hid tpm_crb i2c_i801 efivars serio_raw hid tpm_tis i2c_core tpm_tis_core video thermal tpm backlight wmi intel_pmc_core rng_core acpi_pad ac button battery ip6table_nat iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore msr loop tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 nf_defrag_ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2
Jul 09 00:45:17 nixos-usb-202005 systemd-journald[506]: Missed 101 kernel messages
Jul 09 00:45:17 nixos-usb-202005 kernel:  snd_hda_codec cfg80211 i2c_algo_bit glue_helper evdev intel_lpss_pci fb_sys_fops intel_cstate syscopyarea pstore intel_lpss realtek snd_hda_core intel_uncore mei_me sysfillrect input_leds idma64 rfkill sysimgblt led_class rtsx_pci_ms virt_dma mei libphy intel_rapl_perf memstick intel_pch_thermal intel_xhci_usb_role_switch snd_hwdep roles i2c_hid mac_hid tpm_crb i2c_i801 efivars serio_raw hid tpm_tis i2c_core tpm_tis_core video thermal tpm backlight wmi intel_pmc_core rng_core acpi_pad ac button battery ip6table_nat iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore msr loop tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 nf_defrag_ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2
Jul 09 00:45:17 nixos-usb-202005 systemd-journald[506]: Missed 33 kernel messages

Notify maintainers

Metadata

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
flokli commented 4 years ago

Not sure if this is a nixos-specific issue, but more a kernel bug/driver problem causing log flood, combinded with your journal being on a slow medium. Also, this is missing information about what card and firmware is being used.

Especially as you were able to reproduce similar issues on Debian Testing, I'd propose trying with an even more recent kernel, and in case you can still reproduce, file a bug at your driver.

ShamrockLee commented 4 years ago

Sorry for forgetting to put the metadata.

Result of nix-info -m:

Result of lspci:

$ lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #1 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1d.2 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #11 (rev f1)
00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point LPC Controller/eSPI Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce MX130] (rev a2)
03:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10)
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader (rev 01)
04:00.1 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 12)

How do I get a more relevant kernel If I have upgraded to the latest version? I have thought that the drivers are shipped with the kernel in Linux. Is that right?

ShamrockLee commented 4 years ago

I have just found a solution (or workaround):

Specify the Linux kernel to use with boot.kernelPackages = pkgs.linuxPackages_4_19 in /etc/nixos/configuration.nix

I don't know if there would be any potential side effect, but it works for me.

flokli commented 4 years ago

I have thought that the drivers are shipped with the kernel in Linux. Is that right?

Yes - there's also firmware, in a separate package, but usually multiple versions are shipped, and the kernel driver chooses an appropriate version.

This is really a kernel regression. Please check if it has been fixed somewhere between 5.4.49 and the latest upstream kernel (5.7.8, or even master), and if not, get in touch with kernel people.

You now explicitly selected an older kernel, which might work for some time, until it gets EOL and removed.

I'll close this issue, as it's not a NixOS issue, but a kernel driver regression.