ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

Kernel BUG() on Linux 6.8 with Clang 17+ and LTO, rt5640 audio fails #2017

Open anh0516 opened 4 months ago

anh0516 commented 4 months ago

On a Dell Venue 8 Pro 5830, Linux 6.8 with Clang and LTO triggers a BUG() and the rt5640 audio driver to fail. This happens with the older driver and with the newer SOF driver forced with snd-intel-dspcfg.dsp_driver=3 on the kernel command line. It was fine on 6.7, but there were changes to the rt5640 driver in 6.8 that seem to have broken things. This happens with both clang 17 and 18, built from Arch Linux and from my Gentoo box, so that eliminates quite a few variables.

BUG: unable to handle page fault for address: 00000000ffffffff
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 196 Comm: (udev-worker) Tainted: G         C         6.8.6-llvm #2
Hardware name: Dell Inc. Venue 8 Pro 5830/09RP78, BIOS A16 02/27/2018
RIP: 0010:strcmp+0x10/0x30
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f b6 14 0e 88 14 08 48 ff c1 84 d2 75 f2 c3 cc 31 c0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> b6 0c 07 0f b6 14 06 38 d1 75 0a 48 ff c0 84 c9 75 ed 31 c0 c3
RSP: 0018:ffffc900004bf778 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880202e4428 RCX: ffff88800495a6a8
RDX: ffff88800495a6a8 RSI: ffffffffa09d0388 RDI: 00000000ffffffff
RBP: ffffc900004bf8c8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8880202e4400 R11: ffffffffa08cd03f R12: ffff88800378f600
R13: ffff88800495a400 R14: ffffffffa08e30a8 R15: ffff88800495a410
FS:  00007f979c6bd540(0000) GS:ffff888076900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000003e2d000 CR4: 00000000001006f0
Call Trace:
 <TASK>
 ? __die_body+0x6d/0x120
 ? page_fault_oops+0x37d/0x470
 ? kernelmode_fixup_or_oops+0xf4/0x130
 ? do_user_addr_fault+0x19d/0x6e0
 ? exc_page_fault+0x47/0x70
 ? asm_exc_page_fault+0x22/0x30
 ? snd_byt_rt5640_mc_probe+0x3f/0xcb0 [snd_soc_sst_bytcr_rt5640]
 ? strcmp+0x10/0x30
 snd_byt_rt5640_mc_probe+0x70/0xcb0 [snd_soc_sst_bytcr_rt5640]
 ? idr_alloc_cyclic+0x16a/0x200
 ? kernfs_link_sibling+0xa8/0x190
 ? kernfs_add_one+0x2c0/0x360
 ? 0xffffffffa08cd000
 platform_probe+0x48/0xb0
 really_probe+0x1ce/0x400
 __driver_probe_device+0x146/0x250
 driver_probe_device+0x1e/0x230
 __driver_attach+0x13f/0x2f0
 ? driver_attach+0x20/0x20
 bus_for_each_dev+0x151/0x1e0
 bus_add_driver+0x1e5/0x2f0
 driver_register+0x71/0x170
 ? 0xffffffffa0832000
 do_one_initcall+0xe0/0x350
 ? idr_alloc_cyclic+0x16a/0x200
 ? kernfs_link_sibling+0xa8/0x190
 ? idr_alloc_cyclic+0x16a/0x200
 ? kernfs_link_sibling+0xa8/0x190
 ? kernfs_add_one+0x2c0/0x360
 ? __kernfs_create_file+0xa9/0xe0
 ? sysfs_create_bin_file+0xc4/0x100
 ? kobject_create_and_add+0x72/0xd0
 ? add_notes_attrs+0x190/0x200
 ? free_unref_page+0xa7/0x130
 ? load_module+0x18c3/0x1b20
 do_init_module+0x65/0x480
 __se_sys_finit_module+0x30f/0x480
 do_syscall_64+0x65/0x130
 ? exc_page_fault+0x47/0x70
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f979d28d15d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 ab 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007fff65c87008 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000560b8e7d3530 RCX: 00007f979d28d15d
RDX: 0000000000000000 RSI: 00007f979d3ac2f0 RDI: 0000000000000018
RBP: 00007f979d3ac2f0 R08: 0000000000000000 R09: 0000560b8e872710
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
R13: 0000560b8e86dd70 R14: 0000560b8e867b80 R15: 0000560b8e872950
 </TASK>
Modules linked in: snd_soc_sst_bytcr_rt5640(+) intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp coretemp gpio_keys crct10dif_pclmul crc32_pclmul hid_multitouch hid_sensor_hub ath6kl_sdio(+) polyval_generic gf128mul ath6kl_core ghash_clmulni_intel aesni_intel crypto_simd snd_sof_acpi_intel_byt snd_sof_xtensa_dsp cryptd cfg80211 snd_sof_acpi snd_sof_intel_atom rfkill intel_cstate snd_sof snd_sof_utils intel_bytcrc_pwrsrc snd_intel_sst_acpi i915 snd_soc_acpi_intel_match snd_intel_sst_core snd_soc_sst_atom_hifi2_platform snd_soc_rt5640 snd_intel_dspcfg snd_soc_acpi snd_soc_rl6231 int3401_thermal i2c_algo_bit processor_thermal_device snd_soc_core drm_display_helper processor_thermal_power_floor processor_thermal_wt_hint snd_compress processor_thermal_wt_req drm_buddy ov5693 soc_button_array int3406_thermal dptf_power processor_thermal_rfim video atomisp_mt9m114(C) intel_gtt wmi snd_pcm processor_thermal_mbox v4l2_fwnode processor_thermal_rapl ttm backlight intel_rapl_common snd_timer v4l2_async
 atomisp_gmin_platform(C) vfat int3400_thermal int3403_thermal fat cec acpi_thermal_rel int340x_thermal_zone intel_int0002_vgpio snd videodev i2c_hid_acpi i2c_hid v4l2_cci mc soundcore pwm_lpss_platform pwm_lpss 8250_dw crypto_user mmc_block xhci_pci xhci_pci_renesas xhci_hcd usbcore usb_common sdhci_acpi sdhci mmc_core spi_pxa2xx_platform ext4 crc32c_generic crc32c_intel mbcache crc16 jbd2
CR2: 00000000ffffffff
---[ end trace 0000000000000000 ]---
RIP: 0010:strcmp+0x10/0x30
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f b6 14 0e 88 14 08 48 ff c1 84 d2 75 f2 c3 cc 31 c0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> b6 0c 07 0f b6 14 06 38 d1 75 0a 48 ff c0 84 c9 75 ed 31 c0 c3
RSP: 0018:ffffc900004bf778 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880202e4428 RCX: ffff88800495a6a8
RDX: ffff88800495a6a8 RSI: ffffffffa09d0388 RDI: 00000000ffffffff
RBP: ffffc900004bf8c8 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8880202e4400 R11: ffffffffa08cd03f R12: ffff88800378f600
R13: ffff88800495a400 R14: ffffffffa08e30a8 R15: ffff88800495a410
FS:  00007f979c6bd540(0000) GS:ffff888076900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000003e2d000 CR4: 00000000001006f0

Additionally, the system hangs on the final stages of reboot or poweroff. I don't know if it is related to this or a separate issue.

Here is my kernel configuration: config-6.8.6-lto.txt

Being that this is a pretty uncommon piece of hardware, I am happy to test patches.

nathanchance commented 4 months ago

If it does not happen on 6.7 but it does with 6.8, are you able to bisect to see what change introduced this? The address it is faulting at seems rather suspect (00000000ffffffff), what is your CONFIG_INIT_STACK_ value?

nickdesaulniers commented 4 months ago

Instruction pointer is in strcmp, called from snd_byt_rt5640_mc_probe. Perhaps there's a bug somewhere near there?

FWIW:

./scripts/decodecode < /tmp/x
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f b6 14 0e 88 14 08 48 ff c1 84 d2 75 f2 c3 cc 31 c0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> b6 0c 07 0f b6 14 06 38 d1 75 0a 48 ff c0 84 c9 75 ed 31 c0 c3
All code
========
   0:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
   7:   00 00 00 
   a:   0f b6 14 0e             movzbl (%rsi,%rcx,1),%edx
   e:   88 14 08                mov    %dl,(%rax,%rcx,1)
  11:   48 ff c1                inc    %rcx
  14:   84 d2                   test   %dl,%dl
  16:   75 f2                   jne    0xa
  18:   c3                      ret
  19:   cc                      int3
  1a:   31 c0                   xor    %eax,%eax
  1c:   66 66 66 66 66 2e 0f    data16 data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
  23:   1f 84 00 00 00 00 00 
  2a:*  0f b6 0c 07             movzbl (%rdi,%rax,1),%ecx       <-- trapping instruction
  2e:   0f b6 14 06             movzbl (%rsi,%rax,1),%edx
  32:   38 d1                   cmp    %dl,%cl
  34:   75 0a                   jne    0x40
  36:   48 ff c0                inc    %rax
  39:   84 c9                   test   %cl,%cl
  3b:   75 ed                   jne    0x2a
  3d:   31 c0                   xor    %eax,%eax
  3f:   c3                      ret

Code starting with the faulting instruction
===========================================
   0:   0f b6 0c 07             movzbl (%rdi,%rax,1),%ecx
   4:   0f b6 14 06             movzbl (%rsi,%rax,1),%edx
   8:   38 d1                   cmp    %dl,%cl
   a:   75 0a                   jne    0x16
   c:   48 ff c0                inc    %rax
   f:   84 c9                   test   %cl,%cl
  11:   75 ed                   jne    0x0
  13:   31 c0                   xor    %eax,%eax
  15:   c3                      ret

strcmp assumes the parameters are not NULL. Are we sure that byt_rt5640_dais[i].codecs->name is never NULL?

anh0516 commented 4 months ago

I'm using CONFIG_INIT_STACK_NONE, as part of getting what little performance I can out of the Atom CPU of the tablet in question. I'll try INIT_STACK_ALL_ZERO when I get the chance. I attached the kernel config in the original post.

On Wed, Apr 17, 2024, 1:36 PM Nathan Chancellor @.***> wrote:

If it does not happen on 6.7 but it does with 6.8, are you able to bisect to see what change introduced this? The address it is faulting at seems rather suspect (00000000ffffffff), what is your CONFIG_INITSTACK value?

— Reply to this email directly, view it on GitHub https://github.com/ClangBuiltLinux/linux/issues/2017#issuecomment-2061848315, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATSSZDDEC7R3LU4IEVSBXTLY52XJZAVCNFSM6AAAAABGLX6ZDCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRRHA2DQMZRGU . You are receiving this because you authored the thread.Message ID: @.***>

anh0516 commented 4 months ago

@nathanchance same failure with INIT_STACK_ZERO:


#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 204 Comm: (udev-worker) Tainted: G         C         6.8.6-llvm #3
Hardware name: Dell Inc. Venue 8 Pro 5830/09RP78, BIOS A16 02/27/2018
RIP: 0010:strcmp+0x10/0x30
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f b6 14 0e 88 14 08 48 ff c1 84 d2 75 f2 c3 cc 31 c0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> b6 0c 07 0f b6 14 06 38 d1 75 0a 48 ff c0 84 c9 75 ed 31 c0 c3
RSP: 0018:ffffc9000056b748 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880058a1a28 RCX: ffff88800592d2a8
RDX: ffff88800592d2a8 RSI: ffffffffa09c2388 RDI: 00000000ffffffff
RBP: ffffc9000056b898 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8880058a1a00 R11: ffffffffa085003f R12: ffff88800a9b6300
R13: ffff88800592d000 R14: ffffffffa08d10a8 R15: ffff88800592d010
FS:  00007fa0c9f79540(0000) GS:ffff888076880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000005bc7000 CR4: 00000000001006f0
Call Trace:
 <TASK>
 ? __die_body+0x6d/0x120
 ? page_fault_oops+0x3ce/0x4c0
 ? kernelmode_fixup_or_oops+0xf4/0x130
 ? exc_page_fault+0x47/0x70
 ? asm_exc_page_fault+0x22/0x30
 ? snd_byt_rt5640_mc_probe+0x3f/0xcb0 [snd_soc_sst_bytcr_rt5640]
 ? strcmp+0x10/0x30
 snd_byt_rt5640_mc_probe+0x70/0xcb0 [snd_soc_sst_bytcr_rt5640]
 ? rwsem_down_write_slowpath+0x9d/0x600
 ? try_to_wake_up+0x308/0x370
 ? rwsem_wake+0x92/0xf0
 ? kernfs_activate+0x1e8/0x200
 ? kernfs_add_one+0x2c0/0x360
 ? 0xffffffffa0850000
 platform_probe+0x48/0xb0
 really_probe+0x1ce/0x400
 __driver_probe_device+0x146/0x250
 driver_probe_device+0x1e/0x240
 __driver_attach+0x13f/0x2f0
 ? driver_attach+0x20/0x20
 bus_for_each_dev+0x141/0x1e0
 bus_add_driver+0x1e5/0x2f0
 driver_register+0x71/0x170
 ? 0xffffffffa08bf000
 do_one_initcall+0x130/0x3a0
 ? try_to_wake_up+0x308/0x370
 ? rwsem_wake+0x92/0xf0
 ? kernfs_activate+0x1e8/0x200
 ? sched_clock+0xc/0x20
 ? rwsem_down_write_slowpath+0xc6/0x600
 ? try_to_wake_up+0x308/0x370
 ? rwsem_wake+0x92/0xf0
 ? kernfs_activate+0x1e8/0x200
 ? kernfs_add_one+0x2c0/0x360
 ? __kernfs_create_file+0xa9/0xe0
 ? sysfs_create_bin_file+0xc4/0x100
 ? kobject_create_and_add+0x72/0xd0
 ? add_notes_attrs+0x190/0x200
 ? __slab_free+0x7b/0x2e0
 ? load_module+0x18d3/0x1b30
 do_init_module+0x65/0x480
 __se_sys_finit_module+0x332/0x4a0
 do_syscall_64+0x65/0x130
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fa0cab4915d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 ab 0d 00 f7 d8 64 89 01 48
RSP: 002b:00007ffce60b7818 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000563062474100 RCX: 00007fa0cab4915d
RDX: 0000000000000000 RSI: 00007fa0cac682f0 RDI: 000000000000001e
RBP: 00007fa0cac682f0 R08: 0000000000000001 R09: 0000563062478a40
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000020000
R13: 0000563062468910 R14: 0000563062474d60 R15: 0000563062478c80
 </TASK>
Modules linked in: snd_soc_sst_bytcr_rt5640(+) intel_soc_dts_thermal mousedev intel_soc_dts_iosf intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul gpio_keys ath6kl_sdio(+) hid_multitouch polyval_generic ath6kl_core hid_sensor_hub gf128mul ghash_clmulni_intel cfg80211 snd_sof_acpi_intel_byt snd_sof_xtensa_dsp snd_sof_acpi rfkill aesni_intel snd_sof_intel_atom crypto_simd snd_sof cryptd snd_sof_utils intel_cstate i915 snd_intel_sst_acpi snd_soc_acpi_intel_match snd_intel_sst_core snd_soc_sst_atom_hifi2_platform intel_bytcrc_pwrsrc snd_soc_rt5640 snd_intel_dspcfg snd_soc_rl6231 snd_soc_acpi int3401_thermal snd_soc_core i2c_algo_bit processor_thermal_device snd_compress processor_thermal_power_floor processor_thermal_wt_hint processor_thermal_wt_req ov5693 drm_display_helper atomisp_mt9m114(C) snd_pcm processor_thermal_rfim v4l2_fwnode atomisp_gmin_platform(C) int3406_thermal v4l2_async drm_buddy video processor_thermal_mbox snd_timer wmi processor_thermal_rapl intel_gtt snd soc_button_array backlight
 dptf_power ttm intel_rapl_common videodev vfat int3400_thermal fat int3403_thermal int340x_thermal_zone acpi_thermal_rel cec mc v4l2_cci soundcore intel_int0002_vgpio i2c_hid_acpi i2c_hid 8250_dw pwm_lpss_platform pwm_lpss crypto_user mmc_block xhci_pci xhci_pci_renesas xhci_hcd usbcore usb_common sdhci_acpi sdhci mmc_core spi_pxa2xx_platform ext4 crc32c_generic crc32c_intel mbcache crc16 jbd2
CR2: 00000000ffffffff
---[ end trace 0000000000000000 ]---
RIP: 0010:strcmp+0x10/0x30
Code: 66 2e 0f 1f 84 00 00 00 00 00 0f b6 14 0e 88 14 08 48 ff c1 84 d2 75 f2 c3 cc 31 c0 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 <0f> b6 0c 07 0f b6 14 06 38 d1 75 0a 48 ff c0 84 c9 75 ed 31 c0 c3
RSP: 0018:ffffc9000056b748 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880058a1a28 RCX: ffff88800592d2a8
RDX: ffff88800592d2a8 RSI: ffffffffa09c2388 RDI: 00000000ffffffff
RBP: ffffc9000056b898 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8880058a1a00 R11: ffffffffa085003f R12: ffff88800a9b6300
R13: ffff88800592d000 R14: ffffffffa08d10a8 R15: ffff88800592d010
FS:  00007fa0c9f79540(0000) GS:ffff888076880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000ffffffff CR3: 0000000005bc7000 CR4: 00000000001006f0```
nathanchance commented 4 months ago

I was not necessarily expecting the INIT_STACK configuration to really matter but thank you for checking!

I noticed https://git.kernel.org/linus/7d99a70b65951108d82e1618c67abe69c3ed7720 in the list of changes from 6.7 to 6.8, which seems potentially relevant here since it mentions fixing a strcmp() crash due to a 6.8 change. It just seems like name is an invalid value, not NULL. How does LTO affect that, since this presumably does not happen without LTO?

anh0516 commented 4 months ago

@nathanchance reverting that change did not help, sadly. The failed machine code reported from the BUG() is exactly the same, too, so the failure must be elsewhere. Where did you get the list of changes? kernelnewbies.org hasn't been updated for 6.8 yet and I can't find it anywhere else. (I'm new to kernel debugging.)

nickdesaulniers commented 4 months ago

I noticed https://git.kernel.org/linus/7d99a70b65951108d82e1618c67abe69c3ed7720 It just seems like name is an invalid value, not NULL.

The name field may have a NULL value. snd_byt_rt5640_mc_probe is not checking for that before calling strcmp! If strcmp is inlined due to LTO, and it doesn't check for NULL (the one in lib/string.c doesn't and isn't required to) then perhaps if byt_rt5640_dais[i].codecs->name is determined to be NULL at compile time (loop unroll + cross TU inlining) then LLVM will start removing code due to UB.

EDIT: Nvm my tree was out of date. Perhaps time to break out ubsan?

anh0516 commented 4 months ago

I enabled UBSAN but it didn't catch anything, only two probably unrelated array index out of bounds in net/wireless/nl80211.c.

I'll put them here anyways, though, just in case:

UBSAN: array-index-out-of-bounds in net/wireless/nl80211.c:9203:29
index 47 is out of range for type 'struct ieee80211_channel *[]'
CPU: 1 PID: 307 Comm: wpa_supplicant Tainted: G      D  C         6.8.7-llvm #1
Hardware name: Dell Inc. Venue 8 Pro 5830/09RP78, BIOS A16 02/27/2018
Call Trace:
 <TASK>
 __ubsan_handle_out_of_bounds+0xdd/0x140
 nl80211_trigger_scan+0xaf2/0xc00 [cfg80211]
 ? genl_family_rcv_msg_attrs_parse+0x9d/0xc0
 genl_family_rcv_msg_doit+0xb4/0xf0
 genl_rcv_msg+0x226/0x240
 ? genlmsg_multicast_netns+0x40/0x40 [cfg80211]
 ? nl80211_update_mesh_config+0xf0/0xf0 [cfg80211]
 ? nl80211_pre_doit+0x320/0x320 [cfg80211]
 ? genl_release+0x260/0x260
 netlink_rcv_skb+0x74/0x100
 genl_rcv+0x1f/0x80
 netlink_unicast+0x2cf/0x520
 netlink_sendmsg+0x492/0x5c0
 ____sys_sendmsg+0x1a9/0x260
 __sys_sendmsg+0x2c9/0x330
 do_syscall_64+0x65/0x130
 ? exc_page_fault+0x47/0x70
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fbfb78c1c84
Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d c5 e3 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:00007fffc6d09bc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000055c1408e79d0 RCX: 00007fbfb78c1c84
RDX: 0000000000000000 RSI: 00007fffc6d09c00 RDI: 0000000000000006
RBP: 000055c14095de80 R08: 0000000000000004 R09: 00000000000000f0
R10: 00007fffc6d09cd4 R11: 0000000000000202 R12: 000055c1408e7cb0
R13: 00007fffc6d09c00 R14: 0000000000000000 R15: 00007fffc6d09cd4
 </TASK>
---[ end trace ]---

------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/wireless/nl80211.c:9252:5
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 1 PID: 307 Comm: wpa_supplicant Tainted: G      D  C         6.8.7-llvm #1
Hardware name: Dell Inc. Venue 8 Pro 5830/09RP78, BIOS A16 02/27/2018
Call Trace:
 <TASK>
 __ubsan_handle_out_of_bounds+0xdd/0x140
 nl80211_trigger_scan+0x3a6/0xc00 [cfg80211]
 ? genl_family_rcv_msg_attrs_parse+0x9d/0xc0
 genl_family_rcv_msg_doit+0xb4/0xf0
 genl_rcv_msg+0x226/0x240
 ? genlmsg_multicast_netns+0x40/0x40 [cfg80211]
 ? nl80211_update_mesh_config+0xf0/0xf0 [cfg80211]
 ? nl80211_pre_doit+0x320/0x320 [cfg80211]
 ? genl_release+0x260/0x260
 netlink_rcv_skb+0x74/0x100
 genl_rcv+0x1f/0x80
 netlink_unicast+0x2cf/0x520
 netlink_sendmsg+0x492/0x5c0
 ____sys_sendmsg+0x1a9/0x260
 __sys_sendmsg+0x2c9/0x330
 do_syscall_64+0x65/0x130
 ? exc_page_fault+0x47/0x70
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fbfb78c1c84
Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d c5 e3 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:00007fffc6d09bc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000055c1408e79d0 RCX: 00007fbfb78c1c84
RDX: 0000000000000000 RSI: 00007fffc6d09c00 RDI: 0000000000000006
RBP: 000055c14095de80 R08: 0000000000000004 R09: 00000000000000f0
R10: 00007fffc6d09cd4 R11: 0000000000000202 R12: 000055c1408e7cb0
R13: 00007fffc6d09c00 R14: 0000000000000000 R15: 00007fffc6d09cd4
 </TASK>
---[ end trace ]---
systemd-journald[159]: /var/log/journal/ec011dcc411f49bf97b2f78b4fbfd5a6/user-1000.journal: Journal file uses a different sequence number ID, rotating.
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in net/wireless/nl80211.c:9232:4
index 0 is out of range for type 'struct ieee80211_channel *[]'
CPU: 3 PID: 307 Comm: wpa_supplicant Tainted: G      D  C         6.8.7-llvm #1
Hardware name: Dell Inc. Venue 8 Pro 5830/09RP78, BIOS A16 02/27/2018
Call Trace:
 <TASK>
 __ubsan_handle_out_of_bounds+0xdd/0x140
 nl80211_trigger_scan+0x498/0xc00 [cfg80211]
 ? genl_family_rcv_msg_attrs_parse+0x9d/0xc0
 genl_family_rcv_msg_doit+0xb4/0xf0
 genl_rcv_msg+0x226/0x240
 ? genlmsg_multicast_netns+0x40/0x40 [cfg80211]
 ? nl80211_update_mesh_config+0xf0/0xf0 [cfg80211]
 ? nl80211_pre_doit+0x320/0x320 [cfg80211]
 ? genl_release+0x260/0x260
 netlink_rcv_skb+0x74/0x100
 genl_rcv+0x1f/0x80
 netlink_unicast+0x2cf/0x520
 netlink_sendmsg+0x492/0x5c0
 ____sys_sendmsg+0x1a9/0x260
 __sys_sendmsg+0x2c9/0x330
 do_syscall_64+0x65/0x130
 ? exc_page_fault+0x47/0x70
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fbfb78c1c84
Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d c5 e3 0d 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
RSP: 002b:00007fffc6d09bc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000055c1408e79d0 RCX: 00007fbfb78c1c84
RDX: 0000000000000000 RSI: 00007fffc6d09c00 RDI: 0000000000000006
RBP: 000055c1409621c0 R08: 0000000000000004 R09: 00000000000000f0
R10: 00007fffc6d09cd4 R11: 0000000000000202 R12: 000055c1408e7cb0
R13: 00007fffc6d09c00 R14: 0000000000000000 R15: 00007fffc6d09cd4
 </TASK>
---[ end trace ]---

It probably has to do with this which is output right before: ath6kl: Firmware lacks RSN-CAP-OVERRIDE, so HT (802.11n) is disabled.

I did not enable any other debugging features and left UBSAN at its defaults. What else would you recommend I turn on?

nickdesaulniers commented 4 months ago

Probably worth notifying the maintainers of those drivers, but sounds orthogonal to the issue being tracked here. (If you use triple backticks in GitHub markdown to open and close your trace, it will retain the original line wrapping). Perhaps worth testing ASAN, too. Tough IIRC ASAN is incompatible with LTO. Did you verify you don't observe this without LTO?

nathanchance commented 4 months ago

Perhaps worth testing ASAN, too. Tough IIRC ASAN is incompatible with LTO.

I think KASAN is now allowed with LTO: https://git.kernel.org/linus/349fde599db65d4827820ef6553e3f9ee75b8c7c

nathanchance commented 4 months ago

Something that occurred to me is LTO may have inlined some other function that calls strcmp() into snd_byt_rt5640_mc_probe(), which won't be entirely obvious from the stack trace.

Can you try running your stack trace through scripts/decode_stacktrace.sh? Assuming the stack trace is saved in crash.log:

$ LLVM=1 scripts/decode_stacktrace.sh vmlinux <crash.log

and see if that gives us any other idea what is going on here?