AsahiLinux / linux

Linux kernel source tree
Other
2.25k stars 87 forks source link

gpu related crashes with kernel >= 6.9.7 #309

Open oliverbestmann opened 1 month ago

oliverbestmann commented 1 month ago

Since updating from 6.9.5 to to 6.9.6 (and 6.9.9) i get random gpu/drm related crashes after a few minutes of usage.

Jul 15 10:20:18 m1pro kernel: ------------[ cut here ]------------
Jul 15 10:20:18 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 10:20:18 m1pro kernel: WARNING: CPU: 0 PID: 15794 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: sp : ffff800090397440
Jul 15 10:20:18 m1pro kernel: x29: ffff800090397440 x28: 0000000000000030 x27: ffff000014ad5000
Jul 15 10:20:18 m1pro kernel: x26: ffff80007a55d948 x25: 0000000000000000 x24: ffff000139b5dc00
Jul 15 10:20:18 m1pro kernel: x23: ffff800090397888 x22: ffff000139b5cb38 x21: ffff0005be57f5d8
Jul 15 10:20:18 m1pro kernel: x20: ffff00013bfb1c08 x19: ffff00013bfb1c08 x18: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 10:20:18 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 10:20:18 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 21401009 (nzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 10:20:18 m1pro kernel: sp : ffff800090395d40
Jul 15 10:20:18 m1pro kernel: x29: ffff800090395d50 x28: 00000000ffffffa0 x27: ffff000639ee3280
Jul 15 10:20:18 m1pro kernel: x26: ffffffa00000c984 x25: 0000000000212a9c x24: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x23: 736120492079616d x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 10:20:18 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 10:20:18 m1pro kernel: x8 : c98580007a45d9c4 x7 : 0000000000000cc0 x6 : 0000000000000318
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000064ce340
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000200 x1 : 736120492079616d x0 : ffff000001f2cb00
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 10:20:18 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 10:20:18 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw6vertex17RunVertexG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1u_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtN>
Jul 15 10:20:18 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x1ba8/0x1dd0 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP

Going back to 6.9.5 brings back a stable system.

jannau commented 1 month ago

There isn't much of change between asahi-6.9.5-1 and asahi-6.9.6-1 and I don't see relevant changes.

It looks like there is an issue with handling failing drm_sched_can_queue() calls. Is the (GPU) workload at the time of the error in any way remarkable?

mkurz commented 1 month ago

It looks like there is an issue with handling failing drm_sched_can_queue() calls. Is the (GPU) workload at the time of the error in any way remarkable?

I was running in the same (or similar) drm_sched_can_queue problem last week when I upgraded to 6.9.7-1. I downgraded to 6.9.6-1 and had no issues since then anymore. I didn't report because I thought all this is WIP, but maybe this is a bug? (Or is this fixed with a newer release?)

Jul 08 22:08:45 mkurz-macbook-pro kernel: ------------[ cut here ]------------
Jul 08 22:08:45 mkurz-macbook-pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 08 22:08:45 mkurz-macbook-pro kernel: WARNING: CPU: 1 PID: 4579 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: Modules linked in: tls ppp_async l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_gene>
Jul 08 22:08:45 mkurz-macbook-pro kernel: CPU: 1 PID: 4579 Comm: Renderer Tainted: G S                 6.9.7-asahi-1-1-ARCH #1
Jul 08 22:08:45 mkurz-macbook-pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pc : drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: lr : drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel: sp : ffff80009ba07440
Jul 08 22:08:45 mkurz-macbook-pro kernel: x29: ffff80009ba07440 x28: ffff00049660c000 x27: 000000000000000d
Jul 08 22:08:45 mkurz-macbook-pro kernel: x26: ffff80009ba07608 x25: ffff00001579df80 x24: ffff800081739000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x23: ffff00000dce3000 x22: ffff000011ed6938 x21: ffff00049660c1d8
Jul 08 22:08:45 mkurz-macbook-pro kernel: x20: ffff00002e3a1c08 x19: ffff00002e3a1c08 x18: 0000000000000050
Jul 08 22:08:45 mkurz-macbook-pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 08 22:08:45 mkurz-macbook-pro kernel: x14: ffff80008153d288 x13: 2e657461636e7572 x12: 74202c74696d696c
Jul 08 22:08:45 mkurz-macbook-pro kernel: x11: ffff80008153d288 x10: 0000000000000316 x9 : ffff8000815ed288
Jul 08 22:08:45 mkurz-macbook-pro kernel: x8 : 000000000002ffe8 x7 : 00000000ffffe000 x6 : ffff8000815ed288
Jul 08 22:08:45 mkurz-macbook-pro kernel: x5 : 80000000ffffe000 x4 : 0000000000000002 x3 : ffff800081318008
Jul 08 22:08:45 mkurz-macbook-pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0001188f6d00
Jul 08 22:08:45 mkurz-macbook-pro kernel: Call trace:
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_can_queue+0xf4/0x154
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_wakeup+0x18/0x5c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_sched_entity_push_job+0x168/0x1c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvXsI_NtCsfLhZwm4SDSu_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0x131c/0x1604
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvNvXs_NtCsfLhZwm4SDSu_5asahi6driverNtB6_11AsahiDriverNtNtNtCs48FVigIbjZk_6kernel3drm3drv6Driver6IOCTL>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl_kernel+0xbc/0x130
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl+0x20c/0x4c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  __arm64_sys_ioctl+0x2cc/0xc9c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  invoke_syscall.constprop.0+0x50/0xe4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  do_el0_svc+0x40/0xdc
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0_svc+0x38/0x160
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 22:08:45 mkurz-macbook-pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 22:08:45 mkurz-macbook-pro kernel: ------------[ cut here ]------------
Jul 08 22:08:45 mkurz-macbook-pro kernel: WARNING: CPU: 1 PID: 4579 at mm/slub.c:4358 free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel: Modules linked in: tls ppp_async l2tp_ppp l2tp_netlink l2tp_core ip6_udp_tunnel udp_tunnel pppox ppp_gene>
Jul 08 22:08:45 mkurz-macbook-pro kernel: CPU: 1 PID: 4579 Comm: Renderer Tainted: G S      W          6.9.7-asahi-1-1-ARCH #1
Jul 08 22:08:45 mkurz-macbook-pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pstate: 41400009 (nZcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 08 22:08:45 mkurz-macbook-pro kernel: pc : free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel: lr : kfree+0x160/0x1b4
Jul 08 22:08:45 mkurz-macbook-pro kernel: sp : ffff80009ba05dc0
Jul 08 22:08:45 mkurz-macbook-pro kernel: x29: ffff80009ba05dc0 x28: ffff000136110800 x27: ffff00009207e1c0
Jul 08 22:08:45 mkurz-macbook-pro kernel: x26: ffff000027a57008 x25: 0000000002806900 x24: ffffffa000000074
Jul 08 22:08:45 mkurz-macbook-pro kernel: x23: ffffffa6001ab980 x22: ffff80009ba06290 x21: 0000000000000001
Jul 08 22:08:45 mkurz-macbook-pro kernel: x20: ffff000400000500 x19: ffffff7fc4000000 x18: 000000000007815a
Jul 08 22:08:45 mkurz-macbook-pro kernel: x17: 0000000000000000 x16: 00000000ffff0000 x15: 00000000ffffffa6
Jul 08 22:08:45 mkurz-macbook-pro kernel: x14: 001ac0d800000000 x13: 9393939300000000 x12: 0000000000000000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x11: 0000000000000000 x10: 00000000000002a0 x9 : 0000000000000000
Jul 08 22:08:45 mkurz-macbook-pro kernel: x8 : 0000000000000000 x7 : 00000000000002a0 x6 : ffff00039be81900
Jul 08 22:08:45 mkurz-macbook-pro kernel: x5 : ffff80009ba06348 x4 : ffff0001188f6d00 x3 : ffff8000a7be6140
Jul 08 22:08:45 mkurz-macbook-pro kernel: x2 : 0000000000000001 x1 : ffff000400000500 x0 : 0000000000000000
ul 08 22:08:45 mkurz-macbook-pro kernel: Call trace:
Jul 08 22:08:45 mkurz-macbook-pro kernel:  free_large_kmalloc+0xac/0xe0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  kfree+0x160/0x1b4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RINvMs8_NtCsfLhZwm4SDSu_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V12_3INtNtB8_>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvMs_NtNtCsfLhZwm4SDSu_5asahi5queue6renderNtB6_13QueueG13V12_313submit_render+0x169c/0x1da4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvXsI_NtCsfLhZwm4SDSu_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0xfcc/0x1604
Jul 08 22:08:45 mkurz-macbook-pro kernel:  _RNvNvXs_NtCsfLhZwm4SDSu_5asahi6driverNtB6_11AsahiDriverNtNtNtCs48FVigIbjZk_6kernel3drm3drv6Driver6IOCTL>
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl_kernel+0xbc/0x130
Jul 08 22:08:45 mkurz-macbook-pro kernel:  drm_ioctl+0x20c/0x4c0
Jul 08 22:08:45 mkurz-macbook-pro kernel:  __arm64_sys_ioctl+0x2cc/0xc9c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  invoke_syscall.constprop.0+0x50/0xe4
Jul 08 22:08:45 mkurz-macbook-pro kernel:  do_el0_svc+0x40/0xdc
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0_svc+0x38/0x160
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 22:08:45 mkurz-macbook-pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 22:08:45 mkurz-macbook-pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 22:08:45 mkurz-macbook-pro kernel: object pointer: 0x00000000c6ae86e4
jannau commented 1 month ago

asahi-6.9.7-1 contains @asahilina's GPUVM changes so a regression caused by that is at least possible

cyrinux commented 1 month ago

Hi, as I see that @mkurz run a macbook pro, for information, I got this issue on a m2 air. This is totally random but happen several times per day.

asahilina commented 1 month ago

This is in drm/sched so it's less likely to be GPUVM related...

Jobs may not exceed the credit limit, truncate.

This is an impossible condition, since the job credit count is always 1 and the credit limit is 1280 or something like that. So I think there is some kind of memory corruption...

asahilina commented 1 month ago

The realloc crash has some interesting strings...

>>> bytes.fromhex("736120492079616d")[::-1]
b'may I as'

This string is not from the kernel... @oliverbestmann, do you have any idea where this came from?

asahilina commented 1 month ago

Also are we sure this is reproducible with v6.9.6 in at least some cases? Because then it can't be the GPUVM stuff...

jannau commented 1 month ago

If it's reproducible with asahi-6.9.6-1 there's no obvious change which would explain why it's not in asahi-6.9.5-1 as well. Nothing in git range-diff asahi-6.9.5-1...asahi-6.9.6-1 looks related.

asahilina commented 1 month ago

Are these kernels built with clang/llvm by any chance? So far everyone reporting this is on something other than Fedora, and Ella specifically pointed this out on Discord:

i have a small hunch its a compiler bug in clang or ub in drm sched causing freezing when built with clang

jannau commented 1 month ago

@cyrinux please describe which systems you use. Do you use Fedora-Asahi-Remix?

@mkurz / @oliverbestmann do you use LLVM or gcc to build the kernel?

cyrinux commented 1 month ago

@cyrinux please describe which systems you use. Do you use Fedora-Asahi-Remix?

I use nixos unstable with https://github.com/tpwrules/nixos-apple-silicon/ overlay. 😸

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x611f0320]
[    0.000000] Linux version 6.9.9-asahi (nixbld@localhost) (gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42) #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan  1 00:00:00 UTC 1980
[    0.000000] random: crng init done
[    0.000000] Machine model: Apple MacBook Air (13-inch, M2, 2022)
[    0.000000] efi: EFI v2.10 by Das U-Boot
Ella-0 commented 1 month ago

Are these kernels built with clang/llvm by any chance? So far everyone reporting this is on something other than Fedora, and Ella specifically pointed this out on Discord:

i have a small hunch its a compiler bug in clang or ub in drm sched causing freezing when built with clang

My kernel is built with GCC.

[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x612f0240]
[    0.000000] Linux version 6.9.7-asahi (ella@natsu) (gcc (GCC) 14.1.1 20240507, GNU ld (GNU Binutils) 2.42.0) #2 SMP PREEMPT Fri Jul  5 23:30:34 GMT 2024
[    0.000000] KASLR enabled
[    0.000000] random: crng init done
[    0.000000] Machine model: Apple MacBook Pro (14-inch, M1 Pro, 2021)
asahilina commented 1 month ago

Please also report your Mesa versions, and the Rust version used for the kernel compile too.

At this point I'm pretty sure this is random memory corruption, but none of us on Fedora can reproduce it so far...

cyrinux commented 1 month ago

Please also report your Mesa versions, and the Rust version used for the kernel compile too.

At this point I'm pretty sure this is random memory corruption, but none of us on Fedora can reproduce it so far...

$ nix-store --query --requisites /run/current-system | cut -d- -f2- | sort -u | grep -E "(rustc|mesa)"
mesa-24.2.0
mesa-24.2.0-drivers
rustc-1.78.0
rustc-wrapper-1.78.0
mkurz commented 1 month ago

I am running Arch Linux ARM with all packages up to date, thanks to @joske's pull requests: https://github.com/AsahiLinux/PKGBUILDs/pulls/joske

You find the PKGBUILD I am using here: https://github.com/joske/PKGBUILDs/tree/kernel/linux-asahi However, latest one in that branch is using 6.9.7-1 which did crash for me, so I downgraded to 6.9.6-1 one commit before: https://github.com/joske/PKGBUILDs/tree/42afae8c0c27efad565957f5213e096ef971c7bf/linux-asahi - no issues since 2 weeks. I just run makepkg -sicAL to build and install.

Jul 17 15:00:19 mkurz-macbook-pro kernel: Linux version 6.9.6-asahi-1-3-ARCH (linux-asahi@archlinux) (gcc (GCC) 14.1.1 20240507, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Mon, 08 Jul 2024 21:53:07 +0000
$ yay -Q | grep -E 'llvm|clang|mesa|rust|gcc|glibc'
clang 18.1.8-1
gcc 14.1.1+r1+g43b730b9134-1
gcc-libs 14.1.1+r1+g43b730b9134-1
glibc 2.39+r52+gf8e4623421-1
llvm 18.1.8-3
llvm-libs 18.1.8-3
mesa-asahi-edge 24.2.0_pre20240527-3
mesa-asahi-edge-debug 24.2.0_pre20240527-3
mesa-utils 9.0.0-4
rust-bindgen 0.69.4-1
rustup 1.27.1-1
spirv-llvm-translator 18.1.2-1
$ cat rust-toolchain.toml 
[toolchain]
channel = "1.76.0"
components = ["rustc", "cargo", "rust-src"]
targets = ["aarch64-unknown-linux-gnu"]

So for me this happend when going from 6.9.6-1 to 6.9.7-1

mkurz commented 1 month ago

btw. after upgrading llvm/clang I had to re-compile mesa.

asahilina commented 1 month ago

I'm bisecting configs and running into some scary mm-related crashes that have nothing to do with the GPU. I think there is some horrible regression here that affects some kernel configs...

Everyone, please post the value of these kernel configs:

CONFIG_ARM64_PA_BITS CONFIG_ARM64_VA_BITS CONFIG_PGTABLE_LEVELS

For reference, on Fedora we have:

CONFIG_ARM64_PA_BITS=48
CONFIG_ARM64_VA_BITS=48
CONFIG_PGTABLE_LEVELS=4
maximbaz commented 1 month ago

Answering for NixOS (same setup as @cyrinux above), the values seem to be the same as on Fedora.

mkurz commented 1 month ago

From https://github.com/joske/PKGBUILDs/blob/kernel/linux-asahi/config:

$ grep -E 'CONFIG_ARM64_PA_BITS|CONFIG_ARM64_VA_BITS|CONFIG_PGTABLE_LEVELS' config 
CONFIG_PGTABLE_LEVELS=4
# CONFIG_ARM64_VA_BITS_36 is not set
# CONFIG_ARM64_VA_BITS_47 is not set
CONFIG_ARM64_VA_BITS_48=y
# CONFIG_ARM64_VA_BITS_52 is not set
CONFIG_ARM64_VA_BITS=48
CONFIG_ARM64_PA_BITS_48=y
CONFIG_ARM64_PA_BITS=48

Both the same when building 6.9.6-1 or 6.9.7-1.

The only difference between in config between the two kernels is: https://github.com/joske/PKGBUILDs/commit/14913f3d5a3e17f61303424f0a16c80581551138#diff-3a3fd6cbc5653e937609572c62143e181842a4a1ebdc1b55e9e2e34e6aa6c5fc

montchr commented 1 month ago

I just ran into this, also using https://github.com/tpwrules/nixos-apple-silicon/tree/6015c1e2f91896e0b7a983c2824c665af32f568a

Jul 17 20:30:16 tuvok kernel: ------------[ cut here ]------------
Jul 17 20:30:16 tuvok kernel: asahi 206400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 17 20:30:16 tuvok kernel: WARNING: CPU: 3 PID: 19136 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: Modules linked in: usbhid xhci_plat_hcd xhci_hcd xt_mark snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype overlay bnep brcmfmac_wcc joydev hid_magicmouse appledrm macsmc_hwmon macsmc_reboot macsmc_power macsmc_hid ofpart tps6598x snd_soc_cs42l84 spi_nor apple_isp videobuf2_dma_sg snd_soc_tas2764 hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_admac clk_apple_nco apple_dcp videobuf2_common asahi pwm_apple mux_core mc drm_dma_helper apple_soc_cpufreq snd_soc_apple_mca hci_bcm4377 brcmfmac bluetooth brcmutil snd_soc_macaudio leds_pwm cfg80211 ecdh_generic ecc rfkill xt_conntrack ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables uinput evdi(O) loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap macvlan bridge stp llc fuse nfnetlink ip_tables nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi dockchannel_hid regmap_spmi phy_apple_atc pcie_apple pci_host_common
Jul 17 20:30:16 tuvok kernel:  typec macsmc_rtkit dwc3 nvme_apple macsmc mfd_core apple_rtkit_helper nvmem_apple_efuses spmi_apple_controller udc_core apple_dockchannel apple_sart pinctrl_apple_gpio i2c_pasemi_platform spi_apple i2c_pasemi_core apple_dart
Jul 17 20:30:16 tuvok kernel: CPU: 3 PID: 19136 Comm: Renderer Tainted: G S         O       6.9.9-asahi #1-NixOS
Jul 17 20:30:16 tuvok kernel: Hardware name: Apple MacBook Air (13-inch, M2, 2022) (DT)
Jul 17 20:30:16 tuvok kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 17 20:30:16 tuvok kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel: sp : ffff800098ea7440
Jul 17 20:30:16 tuvok kernel: x29: ffff800098ea7440 x28: 0000000000000030 x27: ffff00001262a000
Jul 17 20:30:16 tuvok kernel: x26: ffff80007a421910 x25: 0000000000000000 x24: ffff0000262c7300
Jul 17 20:30:16 tuvok kernel: x23: ffff800098ea7888 x22: ffff0001db450338 x21: ffff0000a78b99d8
Jul 17 20:30:16 tuvok kernel: x20: ffff00022c869208 x19: ffff00022c869208 x18: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 17 20:30:16 tuvok kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 17 20:30:16 tuvok kernel: Call trace:
Jul 17 20:30:16 tuvok kernel:  drm_sched_can_queue+0x110/0x168
Jul 17 20:30:16 tuvok kernel:  drm_sched_wakeup+0x18/0x7c
Jul 17 20:30:16 tuvok kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 17 20:30:16 tuvok kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 17 20:30:16 tuvok kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 17 20:30:16 tuvok kernel:  drm_ioctl+0x23c/0x4e4
Jul 17 20:30:16 tuvok kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 17 20:30:16 tuvok kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 17 20:30:16 tuvok kernel:  do_el0_svc+0x40/0xf0
Jul 17 20:30:16 tuvok kernel:  el0_svc+0x34/0x11c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync+0x190/0x194
Jul 17 20:30:16 tuvok kernel: ---[ end trace 0000000000000000 ]---
Jul 17 20:30:16 tuvok kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 17 20:30:16 tuvok kernel: Mem abort info:
Jul 17 20:30:16 tuvok kernel:   ESR = 0x0000000096000007
Jul 17 20:30:16 tuvok kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 17 20:30:16 tuvok kernel:   SET = 0, FnV = 0
Jul 17 20:30:16 tuvok kernel:   EA = 0, S1PTW = 0
Jul 17 20:30:16 tuvok kernel:   FSC = 0x07: level 3 translation fault
Jul 17 20:30:16 tuvok kernel: Data abort info:
Jul 17 20:30:16 tuvok kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 17 20:30:16 tuvok kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 17 20:30:16 tuvok kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 17 20:30:16 tuvok kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000000bc4b30000
Jul 17 20:30:16 tuvok kernel: [ffff000000000700] pgd=1800000bce3fc003, p4d=1800000bce3fc003, pud=1800000bce3f8003, pmd=1800000bce3f4003, pte=0000000000000000
Jul 17 20:30:16 tuvok kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 17 20:30:16 tuvok kernel: Modules linked in: usbhid xhci_plat_hcd xhci_hcd xt_mark snd_seq_dummy rfcomm snd_hrtimer snd_seq snd_seq_device qrtr nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype overlay bnep brcmfmac_wcc joydev hid_magicmouse appledrm macsmc_hwmon macsmc_reboot macsmc_power macsmc_hid ofpart tps6598x snd_soc_cs42l84 spi_nor apple_isp videobuf2_dma_sg snd_soc_tas2764 hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_admac clk_apple_nco apple_dcp videobuf2_common asahi pwm_apple mux_core mc drm_dma_helper apple_soc_cpufreq snd_soc_apple_mca hci_bcm4377 brcmfmac bluetooth brcmutil snd_soc_macaudio leds_pwm cfg80211 ecdh_generic ecc rfkill xt_conntrack ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables uinput evdi(O) loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap macvlan bridge stp llc fuse nfnetlink ip_tables nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi dockchannel_hid regmap_spmi phy_apple_atc pcie_apple pci_host_common
Jul 17 20:30:16 tuvok kernel:  typec macsmc_rtkit dwc3 nvme_apple macsmc mfd_core apple_rtkit_helper nvmem_apple_efuses spmi_apple_controller udc_core apple_dockchannel apple_sart pinctrl_apple_gpio i2c_pasemi_platform spi_apple i2c_pasemi_core apple_dart
Jul 17 20:30:16 tuvok kernel: CPU: 3 PID: 19136 Comm: Renderer Tainted: G S      W  O       6.9.9-asahi #1-NixOS
Jul 17 20:30:16 tuvok kernel: Hardware name: Apple MacBook Air (13-inch, M2, 2022) (DT)
Jul 17 20:30:16 tuvok kernel: pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 17 20:30:16 tuvok kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 17 20:30:16 tuvok kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 17 20:30:16 tuvok kernel: sp : ffff800098ea5cf0
Jul 17 20:30:16 tuvok kernel: x29: ffff800098ea5d00 x28: 00000000005a0112 x27: ffff000081b13f00
Jul 17 20:30:16 tuvok kernel: x26: 00000000faa60000 x25: 00000000ffffffa0 x24: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 17 20:30:16 tuvok kernel: x20: ffff000001f48b00 x19: 0000000000000328 x18: 00000000000000ff
Jul 17 20:30:16 tuvok kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x14: 0000000000000000 x13: 0000000100000000 x12: 0000000000000000
Jul 17 20:30:16 tuvok kernel: x11: 0000000000000001 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 17 20:30:16 tuvok kernel: x8 : d0b580007a3219c4 x7 : 0000000000000cc0 x6 : 0000000000000328
Jul 17 20:30:16 tuvok kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000002ce914c3
Jul 17 20:30:16 tuvok kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f48b00
Jul 17 20:30:16 tuvok kernel: Call trace:
Jul 17 20:30:16 tuvok kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 17 20:30:16 tuvok kernel:  krealloc+0x9c/0x144
Jul 17 20:30:16 tuvok kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 17 20:30:16 tuvok kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG14V12_4INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs0_NtNtB8_5queue6renderNtB3Q_18QueueInnerG14V12_413submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG14V12_4B4X_EB4X_B4X_NCB3I_s1_0NCB3I_s2_0EB8_+0x800/0x1ea8 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvMs0_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG14V12_413submit_render+0x162c/0x1cd0 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 17 20:30:16 tuvok kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 17 20:30:16 tuvok kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 17 20:30:16 tuvok kernel:  drm_ioctl+0x23c/0x4e4
Jul 17 20:30:16 tuvok kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 17 20:30:16 tuvok kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 17 20:30:16 tuvok kernel:  do_el0_svc+0x40/0xf0
Jul 17 20:30:16 tuvok kernel:  el0_svc+0x34/0x11c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 17 20:30:16 tuvok kernel:  el0t_64_sync+0x190/0x194
Jul 17 20:30:16 tuvok kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9)
Jul 17 20:30:16 tuvok kernel: ---[ end trace 0000000000000000 ]---
Jul 17 20:30:27 tuvok kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 17 20:30:27 tuvok kernel: Mem abort info:
Jul 17 20:30:27 tuvok kernel:   ESR = 0x0000000096000007
Jul 17 20:30:27 tuvok kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 17 20:30:27 tuvok kernel:   SET = 0, FnV = 0
Jul 17 20:30:27 tuvok kernel:   EA = 0, S1PTW = 0
Jul 17 20:30:27 tuvok kernel:   FSC = 0x07: level 3 translation fault
Jul 17 20:30:27 tuvok kernel: Data abort info:
Jul 17 20:30:27 tuvok kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 17 20:30:27 tuvok kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 17 20:30:27 tuvok kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 17 20:30:27 tuvok kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000000bc4b30000
Jul 17 20:30:27 tuvok kernel: [ffff000000000700] pgd=1800000bce3fc003, p4d=1800000bce3fc003, pud=1800000bce3f8003, pmd=1800000bce3f4003, pte=0000000000000000
Jul 17 20:30:27 tuvok kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
asahilina commented 1 month ago

Sorry, I really need a consistent way to reproduce this to track it down. So far I've been unable to repro the drm_sched_can_queue crash myself. The only thing I got are some unrelated crashes in futex code when trying @Ella-0's kernel config (that she sent me via Discord), that I bisected to that VA bits thing, but that may be unrelated or maybe there is a deeper memory management issue this turned out to be completely unrelated.

The crash itself makes no sense. It's memory corruption, where the drm_sched job gets clobbered with something else, and then somehow consistently after that the changes made by drm_sched directly cause a crash in the allocator, in what has to be a subsequent ioctl call because the drm_sched stuff is the last thing the ioctl does. That it's somehow this consistent is very, very strange. I would have expected heap corruption to manifest in more varied ways after the fact. The actual lifetimes of the allocations involved are extremely simple, so I'm 99% sure this isn't a silly lifetime problem in my code (at least not as it relates to the specific structures referenced in the crashes). The code in both the drm_sched_can_queue codepath and in at least one of the subsequent crash codepaths just allocates an object, uses it, and frees it. This is the kind of thing Rust makes almost impossible to get wrong. Unless there's a compiler bug somewhere, I don't see how it's possible for the root cause to be a simple lifetime issue, so I think this has to be a much deeper problem with memory management going wrong elsewhere, and we're just seeing the consequences somehow fairly consistently affect these structures in the GPU driver.

I tried running the same kernel under kASAN and came up with nothing. I also tried Ella's config with kASAN, still nothing, and doing that avoided the crashes that correlated with 52-bit VA support being enabled too.

Best guess is there is a spurious page being freed or something like that, so memory is reused while it is still in use. I actually already ran into one of these before (fixed in 2bb1499537) which would perfectly explain this kind of behavior, except for the fact that that particular one only happened on DART pagetable freeing which only really happens when unbinding drivers (which is why we didn't notice for so long). If there is a similar bug lurking somewhere else, but it only happens sometimes, then that might explain this and the other badness.

I'm 90% sure that there is an upstream regression in memory management somewhere here, but the only lead I have is that 52-bit VA thing, and I don't know if that is the same issue behind the drm_sched_can_queue crashes at this point or something else...

Edit: The 52-bit VA thing is unrelated unfortunately.

asahilina commented 1 month ago

For reference, the 52-bit VA issue causes crashes like this:

[  301.808795] Unable to handle kernel paging request at virtual address ffffa4f440001638
[  301.809656] Mem abort info:
[  301.809943]   ESR = 0x0000000096000005
[  301.810342]   EC = 0x25: DABT (current EL), IL = 32 bits
[  301.810891]   SET = 0, FnV = 0
[  301.811221]   EA = 0, S1PTW = 0
[  301.811560]   FSC = 0x05: level 1 translation fault
[  301.812035] Data abort info:
[  301.812385]   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
[  301.813066]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  301.813711]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  301.814341] swapper pgtable: 16k pages, 47-bit VAs, pgdp=000001001bfdc000
[  301.815221] [ffffa4f440001638] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  301.816309] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
[  301.817081] Modules linked in:
[  301.817381] CPU: 7 PID: 6283 Comm: glmark2-es2-way Not tainted 6.9.9-asahi-01036-g363eb0817ec8 #1
[  301.818388] Hardware name: Apple Mac Mini (M2 Pro, 2023) (DT)
[  301.819078] pstate: 414000c5 (nZcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[  301.819939] pc : internal_get_user_pages_fast+0x248/0xcd0
[  301.820638] lr : internal_get_user_pages_fast+0x1e8/0xcd0
[  301.821275] sp : ffffc000a51d7a20
[  301.821626] x29: ffffc000a51d7ad0 x28: 000055558fea4000 x27: 000055558fea0000
[  301.822450] x26: 000055558fea0000 x25: 0400000000000001 x24: 0000000000000000
[  301.823402] x23: ffffa4f440001638 x22: 0000000000000003 x21: ffffd4b6410b5ff8
[  301.824334] x20: 000055558fea4000 x19: 0000000000000000 x18: 0000000000000000
[  301.825220] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  301.826184] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  301.827100] x11: 00000000ffffffff x10: ffffc000a51d6aa8 x9 : 000055558fea4000
[  301.827862] x8 : 0000000000000000 x7 : 0000000000000001 x6 : 000055558fea0000
[  301.828820] x5 : 000055558fea4000 x4 : 0000000002000000 x3 : 000055558fea3fff
[  301.829662] x2 : 000055558fea3fff x1 : fff05b0bc0000000 x0 : ffffa4f440000000
[  301.830484] Call trace:
[  301.830742]  internal_get_user_pages_fast+0x248/0xcd0
[  301.831343]  get_user_pages_fast+0x48/0x60
[  301.831897]  get_futex_key+0xa4/0x3d0
[  301.832281]  futex_wait_setup+0x6c/0x164
[  301.832777]  __futex_wait+0xbc/0x15c
[  301.833200]  futex_wait+0x88/0x110
[  301.833596]  do_futex+0xf8/0x1a0
[  301.833926]  __arm64_sys_futex+0xec/0x188
[  301.834417]  invoke_syscall.constprop.0+0x50/0xe4
[  301.835029]  do_el0_svc+0x40/0xdc
[  301.835378]  el0_svc+0x3c/0x140
[  301.835796]  el0t_64_sync_handler+0x120/0x12c
[  301.836365]  el0t_64_sync+0x190/0x194
[  301.836845] Code: f90003e2 8b170c17 a905abfc d2a04004 (f94002fa) 
[  301.837591] ---[ end trace 0000000000000000 ]---
[  301.838055] note: glmark2-es2-way[6283] exited with irqs disabled

A very easy repro for this (for me at least) is while true; do timeout -s TERM -k 0 0.5 glmark2-es2-wayland & sleep 0.02 ; done, but I've seen it happen just booting up to a Plasma desktop, so it's not particularly difficult to trigger. However, I also saw this different crash once:

Unable to handle kernel paging request at virtual address 0000033b9a2a6c48
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
systemd[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
systemd[1]: systemd-journald.service: (This warning is only shown for the first unit using IP firewalling.)
user pgtable: 16k pages, 47-bit VAs, pgdp=0000010020330000
[0000033b9a2a6c48] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 4 PID: 1706 Comm: lvm Not tainted 6.9.9-asahi-01036-g363eb0817ec8 #7
systemd[1]: Starting systemd-journald.service - Journal Service...
Hardware name: Apple Mac Mini (M2 Pro, 2023) (DT)
pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : try_get_folio+0x14/0x100
lr : try_grab_folio+0x64/0x228
sp : ffffc0008f6176b0
x29: ffffc0008f6176b0 x28: 1f5b7cd4ad225269 x27: 0000000000000000
x26: ffff80001efa2440 x25: 0000000000000008 x24: 0000000000000008
x23: ffffc0008160f000 x22: ffffc000811f8000 x21: fffff03fc0000000
x20: 0000000000080001 x19: 0000033b9a2a6c40 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 000055557d4980d8
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 000055557d4c0000 x9 : 0000000000000000
x8 : 000055557d4c0000 x7 : 0000000000000555 x6 : ffff8000214ee380
x5 : ffffc0008f616aa8 x4 : 0000033b9a2a6c40 x3 : 0000037cda2a6c40
x2 : 0000000004006814 x1 : 0000000000000008 x0 : 0000033b9a2a6c40
Call trace:
 try_get_folio+0x14/0x100
 try_grab_folio+0x64/0x228
 internal_get_user_pages_fast+0x914/0xc80
 pin_user_pages_fast+0x48/0x60
 iov_iter_extract_pages+0xd4/0x55c
 bio_iov_iter_get_pages+0xac/0x390
 blkdev_direct_IO.part.0+0x104/0x5d0
 blkdev_read_iter+0xc0/0x180
 aio_read.constprop.0+0xa8/0x140
 io_submit_one.constprop.0+0x1f8/0x700
 __arm64_sys_io_submit+0xa4/0x17c
 invoke_syscall.constprop.0+0x74/0xc4
 do_el0_svc+0x40/0xdc
 el0_svc+0x3c/0x140
 el0t_64_sync_handler+0x120/0x12c
 el0t_64_sync+0x190/0x194
Code: a9bd7bfd 910003fd f9000bf3 aa0003f3 (f9400660) 
---[ end trace 0000000000000000 ]---

And I think this one happened on boot before any GPU-related apps ran, so I'm leaning towards there being a major mm issue unrelated to the GPU driver here. But I still don't know why compiling with 52-bit VA support causes it or makes it worse, and I also still don't know whether this has anything to do with the original bug report or not. It's just the only lead I have so far, but I don't know where to go from here...

asahilina commented 1 month ago

I managed to get the kASAN kernel to crash on boot with 52-bit support enabled, even with the GPU and DCP drivers completely disabled. So whatever this is (and it's still unclear if it's related to the original bug report), it has nothing to do with the GPU driver...

[    5.545590] Unable to handle kernel paging request at virtual address fffe2ee0021212e3
[    5.547380] KASAN: maybe wild-memory-access in range [0xfff3770010909718-0xfff377001090971f]
[    5.547382] Mem abort info:
[    5.547382]   ESR = 0x0000000096000004
[    5.547383]   EC = 0x25: DABT (current EL), IL = 32 bits
[    5.547385]   SET = 0, FnV = 0
[    5.547385]   EA = 0, S1PTW = 0
[    5.547386]   FSC = 0x04: level 0 translation fault
[    5.547387] Data abort info:
[    5.547387]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    5.547388]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    5.547389]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    5.547390] swapper pgtable: 16k pages, 47-bit VAs, pgdp=000001001bae8000
[    5.547391] [fffe2ee0021212e3] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[    5.547411] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[    5.547851] systemd[1]: Starting systemd-udev-trigger.service - Coldplug All udev Devices...
[    5.550122] systemd-journald[1528]: Collecting audit messages is enabled.
[    5.550502] Modules linked in:
[    5.550505] CPU: 9 PID: 1519 Comm: lvm Not tainted 6.9.9-asahi-01036-g363eb0817ec8 #57
[    5.550507] Hardware name: Apple Mac Mini (M2 Pro, 2023) (DT)
[    5.557391] pstate: 414000c5 (nZcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[    5.557393] pc : internal_get_user_pages_fast+0x420/0x1728
[    5.558640] lr : internal_get_user_pages_fast+0x3b0/0x1728
[    5.558643] sp : ffffc0008b5cf130
[    5.558644] x29: ffffc0008b5cf130 x28: 0000000000080001 x27: fff3770010909718
[    5.559541] x26: 00005555c60d0000 x25: 1ffff800116b9d55 x24: dfffc00000000000
[    5.559543] x23: 00005555c60b0000 x22: ffffc0008b5cf2f0 x21: 00005555c60b0000
[    5.561619] x20: fff3770010908000 x19: ffff80002ff23e00 x18: ffff80003ad9ec60
[    5.561621] x17: 0000000000000006 x16: 0000000000000000 x15: 0000000000000003
[    5.561624] x14: 1ffff0006cb42551 x13: 1ffff0000581848d x12: ffff80002c0c246c
[    5.563061] x11: 0000000000000007 x10: 1ffff80010819264 x9 : 00005555c60d0000
[    5.563063] x8 : ffffc0008b5ceaa8 x7 : 00000000f1f1f1f1 x6 : 00005555c60d0000
[    5.564479] x5 : ffffc0008b5cf110 x4 : 00000000f3f3f300 x3 : ffffc0008004ee00
[    5.564482] x2 : 00005555c60cffff x1 : fff0810000000000 x0 : 1ffe6ee0021212e3
[    5.564485] Call trace:
[    5.564486]  internal_get_user_pages_fast+0x420/0x1728
[    5.564489]  pin_user_pages_fast+0x9c/0xc4
[    5.564491]  iov_iter_extract_pages+0x234/0x1044
[    5.568171]  bio_iov_iter_get_pages+0x248/0xa90
[    5.568174]  blkdev_direct_IO.part.0+0x3a0/0x143c
[    5.569046]  blkdev_read_iter+0x1cc/0x388
[    5.569048]  aio_read.constprop.0+0x1e0/0x324
[    5.569832]  io_submit_one.constprop.0+0x378/0x1470
[    5.569833]  __arm64_sys_io_submit+0x198/0x2d0
[    5.569835]  invoke_syscall.constprop.0+0xd8/0x1e0
[    5.571131]  do_el0_svc+0xc4/0x1e0
[    5.571133]  el0_svc+0x48/0xc0
[    5.571136]  el0t_64_sync_handler+0x120/0x130
[    5.572115]  el0t_64_sync+0x190/0x194
[    5.572117] Code: b24c2e94 8b000e9b d343ff60 f9003be0 (38f86800) 
[    5.572118] ---[ end trace 0000000000000000 ]---

Unfortunately kASAN doesn't help in this case. But this seems reproducible enough maybe I can debug it...

oliverbestmann commented 1 month ago

Sorry, seems like I am a bit late now, probably nothing new, but still: I am also using NixOS on a mac book m1 pro, same kernel config applies as @maximbaz.. I checked my journalctl logs and it looks like i actually did not run 6.9.6 but 6.9.7, so @mkurz seems to be correct. All kernels are compiled with GCC 13.3.0, no clang. I do not know where may I as might came from. Chromium might have been open, so it could come from anywhere. I've checked the logs of multiple crashes and it looks like it is always the same stacktrace, the register values do differ though. For me the crashes seems to happen very quickly (~3min after boot) when using the zoom webapp.

asahilina commented 1 month ago

Unfortunately, I just confirmed that the 52-bit problem is completely unrelated. Upstream Linux is just broken with the combination of LPA2 (52-bit support), 16K pages, and non-LPA2 hardware. Please don't build with 52-bit support.

So now we're back to square one... I have no idea how to repro the GPU issue ;;

oliverbestmann commented 1 month ago

This implies that it is working fine for you on a macbook pro m1 with wayland and gnome? Running chromium also works? What information would be helpful to you?

asahilina commented 1 month ago

That's the first time I hear gnome is involved, and also nobody mentioned chromium until your previous post ^^;; (the OP does in fact mention the process name is chromium in the oops log, but I missed that bit...)

The more info about the setup I get the better, and if you can try more workloads (for example, webgl tests and other browsery things) and see if you can find something that reproduces it fast that would be very useful...

Right now I'm testing chromium on an M2 Pro Mac Mini and a bunch of maps and webGL stuff doesn't seem to cause any issues, but this is on Fedora. If there's something about the userspace build that matters here, maybe I need to install another distro...

oliverbestmann commented 1 month ago

You are right, I only mentioned wayland and gnome in the issue https://github.com/tpwrules/nixos-apple-silicon/issues/218 here, I am sorry for that.

I just checked my previous boot logs to find everything i can. Here is a different stack trace. This ne does not contain the Warning about a kernel paging request:

Jul 15 09:32:20 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:32:20 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 09:32:20 m1pro kernel: WARNING: CPU: 1 PID: 3268 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4>
Jul 15 09:32:20 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_app>
Jul 15 09:32:20 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S                 6.9.9-asahi #1-NixOS
Jul 15 09:32:20 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:32:20 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:32:20 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel: sp : ffff800095b37440
Jul 15 09:32:20 m1pro kernel: x29: ffff800095b37440 x28: 0000000000000030 x27: ffff00001332a000
Jul 15 09:32:20 m1pro kernel: x26: ffff80007a849948 x25: 0000000000000000 x24: ffff0000b5124b00
Jul 15 09:32:20 m1pro kernel: x23: ffff800095b37888 x22: ffff0000647efe38 x21: ffff0001259a9dd8
Jul 15 09:32:20 m1pro kernel: x20: ffff00005b10e808 x19: ffff00005b10e808 x18: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 09:32:20 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: Call trace:
Jul 15 09:32:20 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 09:32:20 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 09:32:20 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 09:32:20 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:32:20 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:32:20 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:32:20 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:32:20 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:32:20 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:32:20 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:32:20 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:32:20 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:32:20 m1pro kernel: WARNING: CPU: 1 PID: 3268 at mm/slub.c:4358 free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4>
Jul 15 09:32:20 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_app>
Jul 15 09:32:20 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:32:20 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:32:20 m1pro kernel: pstate: 41401009 (nZcv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:32:20 m1pro kernel: pc : free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel: lr : kfree+0x180/0x1d0
Jul 15 09:32:20 m1pro kernel: sp : ffff800095b35c00
Jul 15 09:32:20 m1pro kernel: x29: ffff800095b35c00 x28: ffff00005a93e280 x27: ffff000120c746c0
Jul 15 09:32:20 m1pro kernel: x26: ffffffa0002aca00 x25: 0000000002995000 x24: ffffffa600910000
Jul 15 09:32:20 m1pro kernel: x23: ffff00005b10ea08 x22: ffffffa00000002c x21: 0000000000000001
Jul 15 09:32:20 m1pro kernel: x20: ffff000100000500 x19: ffffff7fc1000000 x18: 000000000000002b
Jul 15 09:32:20 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x14: 0000000000000000 x13: 9393939300000000 x12: 0000000000000000
Jul 15 09:32:20 m1pro kernel: x11: 0000000000000000 x10: 0000000000000268 x9 : 0000000000000000
Jul 15 09:32:20 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000268 x6 : ffff00006533b200
Jul 15 09:32:20 m1pro kernel: x5 : ffff800095b361e0 x4 : ffff00005add2400 x3 : ffff8000992e6700
Jul 15 09:32:20 m1pro kernel: x2 : 0000000000000001 x1 : ffff000100000500 x0 : 0000000000180028
Jul 15 09:32:20 m1pro kernel: Call trace:
Jul 15 09:32:20 m1pro kernel:  free_large_kmalloc+0xdc/0x110
Jul 15 09:32:20 m1pro kernel:  kfree+0x180/0x1d0
Jul 15 09:32:20 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10__>
Jul 15 09:32:20 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 09:32:20 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:32:20 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:32:20 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:32:20 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:32:20 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:32:20 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:32:20 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:32:20 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:32:20 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:32:20 m1pro kernel: object pointer: 0x00000000837d9730

but then a few minutes later:

Jul 15 09:38:48 m1pro kernel: ------------[ cut here ]------------
Jul 15 09:38:48 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 09:38:48 m1pro kernel: WARNING: CPU: 1 PID: 3268 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:48 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:48 m1pro kernel: CPU: 1 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:38:48 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:48 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:48 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel: sp : ffff800095b37440
Jul 15 09:38:48 m1pro kernel: x29: ffff800095b37440 x28: 0000000000000030 x27: ffff00001332a000
Jul 15 09:38:48 m1pro kernel: x26: ffff80007a849948 x25: 0000000000000000 x24: ffff0000b5124b00
Jul 15 09:38:48 m1pro kernel: x23: ffff800095b37888 x22: ffff0000647efe38 x21: ffff00005bf9add8
Jul 15 09:38:48 m1pro kernel: x20: ffff00005b10e808 x19: ffff00005b10e808 x18: 0000000000000000
Jul 15 09:38:48 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 09:38:48 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:38:48 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 09:38:48 m1pro kernel: Call trace:
Jul 15 09:38:48 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 09:38:48 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 09:38:48 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 09:38:48 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 09:38:48 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:48 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:38:48 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:38:48 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:38:48 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:38:48 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:38:48 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:38:48 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:38:48 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:38:48 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:38:48 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 15 09:38:48 m1pro kernel: Mem abort info:
Jul 15 09:38:48 m1pro kernel:   ESR = 0x0000000096000007
Jul 15 09:38:48 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 09:38:48 m1pro kernel:   SET = 0, FnV = 0
Jul 15 09:38:48 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 09:38:48 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 15 09:38:48 m1pro kernel: Data abort info:
Jul 15 09:38:48 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 15 09:38:48 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 09:38:48 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 09:38:48 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5b30000
Jul 15 09:38:48 m1pro kernel: [ffff000000000700] pgd=18000107cf028003, p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 15 09:38:48 m1pro kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 15 09:38:48 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:49 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:49 m1pro gmrun[6128]: curl: (7) Failed to connect to 192.168.86.21 port 80 after 1 ms: Couldn't connect to server
Jul 15 09:38:49 m1pro kernel: CPU: 5 PID: 3268 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 09:38:49 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:49 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:49 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 09:38:49 m1pro kernel: sp : ffff800095b35b30
Jul 15 09:38:49 m1pro kernel: x29: ffff800095b35b40 x28: ffff000053d9cc80 x27: ffff000120519980
Jul 15 09:38:49 m1pro kernel: x26: ffffffa0002b2500 x25: 00000000ffffffa6 x24: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 09:38:49 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000358 x18: 000000000000002b
Jul 15 09:38:49 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x11: ffffffa0002ac9a8 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 09:38:49 m1pro kernel: x8 : 00ce80007a7499c4 x7 : 0000000000000cc0 x6 : 0000000000000358
Jul 15 09:38:49 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000002d68881
Jul 15 09:38:49 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 15 09:38:49 m1pro kernel: Call trace:
Jul 15 09:38:49 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 09:38:49 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 09:38:49 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_18QueueInnerG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4X_EB4X_B4X_NCB3I_s1_0NCB3I_s2_0EB8_+0x79c/0x1f80 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:49 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 09:38:49 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 09:38:49 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 09:38:49 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 09:38:49 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 09:38:49 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 09:38:49 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 09:38:49 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 09:38:49 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 09:38:49 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 09:38:49 m1pro gmrun[3788]: conky: reading exec value failed (perhaps it's not the correct format?)
Jul 15 09:38:49 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 15 09:38:49 m1pro kernel: Mem abort info:
Jul 15 09:38:49 m1pro kernel:   ESR = 0x0000000096000007
Jul 15 09:38:49 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 09:38:49 m1pro kernel:   SET = 0, FnV = 0
Jul 15 09:38:49 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 09:38:49 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 15 09:38:49 m1pro kernel: Data abort info:
Jul 15 09:38:49 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 15 09:38:49 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 09:38:49 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 09:38:49 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5b30000
Jul 15 09:38:49 m1pro kernel: [ffff000000000700] pgd=18000107cf028003, p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 15 09:38:49 m1pro kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
Jul 15 09:38:49 m1pro kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq bnep brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc uvcvideo usbhid videobuf2_vmalloc uvc appledrm rfkill ofpart snd_soc_cs42l84 spi_nor snd_soc_tas2764 apple_sio asahi snd_soc_apple_mca virt_dma apple_admac pwm_apple macsmc_reboot macsmc_power macsmc_hwmon macsmc_hid apple_isp videobuf2_dma_sg hid_apple videobuf2_memops videobuf2_v4l2 videodev apple_dcp videobuf2_common mux_apple_display_crossbar drm_dma_helper clk_apple_nco apple_soc_cpufreq mux_core cdc_mbim cdc_wdm snd_usb_audio snd_hwdep snd_usbmidi_lib>
Jul 15 09:38:49 m1pro kernel:  tps6598x spi_hid_apple nvmem_spmi_mfd rtc_macsmc gpio_macsmc simple_mfd_spmi regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 15 09:38:49 m1pro kernel: CPU: 1 PID: 2880 Comm: Xwayland Tainted: G S    D W          6.9.9-asahi #1-NixOS
Jul 15 09:38:49 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 09:38:49 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 09:38:49 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 09:38:49 m1pro kernel: sp : ffff800093a87400
Jul 15 09:38:49 m1pro kernel: x29: ffff800093a87410 x28: ffff000081e06800 x27: ffff00001332a000
Jul 15 09:38:49 m1pro kernel: x26: 0000000000000001 x25: 0000000000048bc6 x24: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000dc0
Jul 15 09:38:49 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000278 x18: 0000000000000000
Jul 15 09:38:49 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe8c8d008
Jul 15 09:38:49 m1pro kernel: x14: 0000000000000000 x13: 0000000400000004 x12: ffff000081e06e00
Jul 15 09:38:49 m1pro kernel: x11: 0000000000000008 x10: fffffffffffffff8 x9 : 0000000000000000
Jul 15 09:38:49 m1pro kernel: x8 : 468980007a74fc60 x7 : 0000000000000dc0 x6 : 0000000000000278
Jul 15 09:38:49 m1pro kernel: x5 : ffff800093a877c8 x4 : 0000000000000000 x3 : 0000000002d68881
Jul 15 09:38:49 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 15 09:38:49 m1pro kernel: Call trace:
Jul 15 09:38:49 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 09:38:49 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 09:38:49 m1pro kernel:  _RNvMsb_NtNtCsc1LFWrxnNA7_6kernel3drm5schedINtB5_6EntityNtNtCsirMamryJlsQ_5asahi5queue16QueueJobG13V13_5E7new_jobBV_+0x2c/0xc0 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x8c8/0x1578 [asahi]
Jul 15 09:38:49 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 09:38:49 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c

Then i have one from 6.9.7:

Jul 08 09:38:09 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 08 09:38:09 m1pro kernel: WARNING: CPU: 1 PID: 3046 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: Modules linked in: vhost_net vhost vhost_iotlb uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm bnep uvcvideo videobuf2_vmalloc uvc brcmfmac_wcc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device usbhid joydev hci_bcm4377 hid_magicmouse bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc appledrm snd_soc_macaudio ofpart snd_soc_cs42l84 spi_>
Jul 08 09:38:09 m1pro kernel:  spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc mfd_core pinctrl_apple_gpio spmi_apple_controller phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 08 09:38:09 m1pro kernel: CPU: 1 PID: 3046 Comm: chromium Tainted: G S                 6.9.7-asahi #1-NixOS
Jul 08 09:38:09 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 09:38:09 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 08 09:38:09 m1pro kernel: pc : drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: lr : drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel: sp : ffff800089e17440
Jul 08 09:38:09 m1pro kernel: x29: ffff800089e17440 x28: ffff800089e17888 x27: ffff000016c10000
Jul 08 09:38:09 m1pro kernel: x26: ffff80007a46d948 x25: 0000000000000000 x24: ffff000073c1a280
Jul 08 09:38:09 m1pro kernel: x23: ffff00007f00fc00 x22: ffff000073c3f638 x21: ffff00007f00fdd8
Jul 08 09:38:09 m1pro kernel: x20: ffff000072486e08 x19: ffff000072486e08 x18: fffffffffffd8e28
Jul 08 09:38:09 m1pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 08 09:38:09 m1pro kernel: x14: 6465656378652074 x13: ffff8000814cd310 x12: 0000000000000cff
Jul 08 09:38:09 m1pro kernel: x11: 0000000000000455 x10: ffff80008157d310 x9 : ffff8000814cd310
Jul 08 09:38:09 m1pro kernel: x8 : 00000000ffffdfff x7 : ffff80008157d310 x6 : 80000000ffffe000
Jul 08 09:38:09 m1pro kernel: x5 : 0000000000000456 x4 : 0000000000000002 x3 : ffff8000812b0008
Jul 08 09:38:09 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00001641b600
Jul 08 09:38:09 m1pro kernel: Call trace:
Jul 08 09:38:09 m1pro kernel:  drm_sched_can_queue+0xec/0x144
Jul 08 09:38:09 m1pro kernel:  drm_sched_wakeup+0x18/0x54
Jul 08 09:38:09 m1pro kernel:  drm_sched_entity_push_job+0x15c/0x1a8
Jul 08 09:38:09 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12c4/0x157c [asahi]
Jul 08 09:38:09 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 08 09:38:09 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 08 09:38:09 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 08 09:38:09 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 08 09:38:09 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 08 09:38:09 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 08 09:38:09 m1pro kernel:  el0_svc+0x34/0xfc
Jul 08 09:38:09 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 09:38:09 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 09:38:09 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 09:38:09 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.912244:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:09 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 08 09:38:09 m1pro kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 08 09:38:09 m1pro kernel: Mem abort info:
Jul 08 09:38:09 m1pro kernel:   ESR = 0x0000000096000007
Jul 08 09:38:09 m1pro kernel: Mem abort info:
Jul 08 09:38:09 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 08 09:38:09 m1pro kernel:   ESR = 0x0000000096000007
Jul 08 09:38:09 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 08 09:38:10 m1pro kernel:   SET = 0, FnV = 0
Jul 08 09:38:10 m1pro kernel:   SET = 0, FnV = 0
Jul 08 09:38:10 m1pro kernel:   EA = 0, S1PTW = 0
Jul 08 09:38:10 m1pro kernel:   EA = 0, S1PTW = 0
Jul 08 09:38:10 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 08 09:38:10 m1pro kernel: Data abort info:
Jul 08 09:38:10 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 08 09:38:10 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 08 09:38:10 m1pro kernel:   FSC = 0x07: level 3 translation fault
Jul 08 09:38:10 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 08 09:38:10 m1pro kernel: Data abort info:
Jul 08 09:38:10 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5fa0000
Jul 08 09:38:10 m1pro kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 08 09:38:10 m1pro kernel: [ffff000000000700] pgd=18000107cf028003
Jul 08 09:38:10 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 08 09:38:10 m1pro kernel: , p4d=18000107cf028003, pud=18000107cf024003, pmd=18000107cf020003, pte=0000000000000000
Jul 08 09:38:10 m1pro kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 08 09:38:10 m1pro kernel: Modules linked in: vhost_net vhost vhost_iotlb uinput xt_conntrack nft_chain_nat xt_MASQUERADE
Jul 08 09:38:10 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 08 09:38:10 m1pro kernel:  nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm bnep uvcvideo videobuf2_vmalloc uvc brcmfmac_wcc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_seq_device usbhid joydev
Jul 08 09:38:10 m1pro kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000107c5fa0000
Jul 08 09:38:10 m1pro kernel:  hci_bcm4377 hid_magicmouse bluetooth brcmfmac brcmutil cfg80211 ecdh_generic ecc appledrm snd_soc_macaudio ofpart snd_soc_cs42l84 spi_nor rfkill snd_soc_tas2764 apple_sio asahi apple_admac snd_soc_apple_mca pwm_apple virt_dma
Jul 08 09:38:10 m1pro kernel: [ffff000000000700] pgd=18000107cf028003
Jul 08 09:38:10 m1pro kernel:  macsmc_reboot macsmc_hid macsmc_power apple_isp videobuf2_dma_sg videobuf2_memops hid_apple videobuf2_v4l2 videodev videobuf2_common mc clk_apple_nco apple_dcp apple_soc_cpufreq drm_dma_helper mux_apple_display_crossbar leds_pwm mux_core loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap
Jul 08 09:38:10 m1pro kernel: , p4d=18000107cf028003
Jul 08 09:38:10 m1pro kernel:  macvlan bridge stp llc fuse nfnetlink ip_tables xhci_plat_hcd xhci_hcd sdhci_pci cqhci sdhci mmc_core nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc mfd_core
Jul 08 09:38:10 m1pro kernel: , pud=18000107cf024003
Jul 08 09:38:10 m1pro kernel:  pinctrl_apple_gpio spmi_apple_controller phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 08 09:38:10 m1pro kernel: CPU: 2 PID: 3046 Comm: chromium Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 08 09:38:10 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 08 09:38:10 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 08 09:38:10 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2a4
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.964931:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093809.977288:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.009285:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel: , pmd=18000107cf020003
Jul 08 09:38:10 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2a4
Jul 08 09:38:10 m1pro kernel: sp : ffff800089e15b30
Jul 08 09:38:10 m1pro kernel: x29: ffff800089e15b40 x28: ffff000229055a80 x27: ffff00011ea08480
Jul 08 09:38:10 m1pro kernel: x26: ffffffa000084000 x25: 00000000ffffffa6 x24: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 08 09:38:10 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000358 x18: 0000000000000008
Jul 08 09:38:10 m1pro kernel: x17: 0000000000000000
Jul 08 09:38:10 m1pro kernel: , pte=0000000000000000
Jul 08 09:38:10 m1pro kernel:  x16: 0000000000000000 x15: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 08 09:38:10 m1pro kernel: x11: ffffffa00007e4a8 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 08 09:38:10 m1pro kernel: x8 : 648d80007a36d99c x7 : 0000000000000cc0 x6 : 0000000000000358
Jul 08 09:38:10 m1pro kernel: x5 : ffff0000b3f4b1ac x4 : 0000000000000000 x3 : 0000000000ce7781
Jul 08 09:38:10 m1pro kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f2cb00
Jul 08 09:38:10 m1pro kernel: Call trace:
Jul 08 09:38:10 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2a4
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.041288:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel: 
Jul 08 09:38:10 m1pro kernel:  krealloc+0x7c/0xe4
Jul 08 09:38:10 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.077402:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.109345:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.141292:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.177507:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.209386:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.241462:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.277387:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.309381:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.341297:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.377374:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.409408:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro zoom.desktop[3046]: [3046:3127:0708/093810.441404:ERROR:gbm_pixmap_wayland.cc(82)] Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE
Jul 08 09:38:10 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 08 09:38:10 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_13QueueG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4S_EB4S_B4S_NCB3I_s1_0NCB3I_s2_0EB8_+0x79c/0x1f80 [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_13QueueG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf9c/0x157c [asahi]
Jul 08 09:38:10 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 08 09:38:10 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 08 09:38:10 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 08 09:38:10 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 08 09:38:10 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 08 09:38:10 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 08 09:38:10 m1pro kernel:  el0_svc+0x34/0xfc
Jul 08 09:38:10 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 08 09:38:10 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 08 09:38:10 m1pro kernel: Code: 54000b60 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 08 09:38:10 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 08 09:38:10 m1pro kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
~

I got this warning from chromium in the log 3260 times: Cannot create bo with format= YUV_420_BIPLANAR and usage=SCANOUT_CPU_READ_WRITE, but it does not appear to be related. I get it mulitple times per second while using the zoom webapp, even on 6.9.5. I also get the freeze without it appearing once.

It looks like it is not only chromium, here I have one crash in Xwayland on 6.9.7:

Jul 06 17:57:04 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 06 17:57:04 m1pro kernel: WARNING: CPU: 1 PID: 2821 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 1 PID: 2821 Comm: Xwayland Tainted: G S                 6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: lr : drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f17440
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f17440 x28: ffff800093f17888 x27: ffff00001495d000
Jul 06 17:57:04 m1pro kernel: x26: 0000000000000000 x25: 0000000000000000 x24: ffff00005fd539c0
Jul 06 17:57:04 m1pro kernel: x23: ffff00059c506400 x22: ffff00004c279238 x21: ffff00059c5065d8
Jul 06 17:57:04 m1pro kernel: x20: ffff00007f658008 x19: ffff00007f658008 x18: fffffffffffd50e0
Jul 06 17:57:04 m1pro kernel: x17: 636e757274202c74 x16: 696d696c20746964 x15: 6572632065687420
Jul 06 17:57:04 m1pro kernel: x14: 6465656378652074 x13: ffff8000814cd310 x12: 0000000000000b8e
Jul 06 17:57:04 m1pro kernel: x11: 00000000000003da x10: ffff80008157d310 x9 : ffff8000814cd310
Jul 06 17:57:04 m1pro kernel: x8 : 00000000ffffdfff x7 : ffff80008157d310 x6 : 80000000ffffe000
Jul 06 17:57:04 m1pro kernel: x5 : 00000000000003db x4 : 0000000000000002 x3 : ffff8000812b0008
Jul 06 17:57:04 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066486d80
Jul 06 17:57:04 m1pro kernel: Call trace:
Jul 06 17:57:04 m1pro kernel:  drm_sched_can_queue+0xec/0x144
Jul 06 17:57:04 m1pro kernel:  drm_sched_wakeup+0x18/0x54
Jul 06 17:57:04 m1pro kernel:  drm_sched_entity_push_job+0x15c/0x1a8
Jul 06 17:57:04 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12c4/0x157c [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 06 17:57:04 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 06 17:57:04 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 06 17:57:04 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 06 17:57:04 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 06 17:57:04 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 06 17:57:04 m1pro kernel:  el0_svc+0x34/0xfc
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 06 17:57:04 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 06 17:57:04 m1pro kernel: ------------[ cut here ]------------
Jul 06 17:57:04 m1pro kernel: WARNING: CPU: 1 PID: 2821 at mm/slub.c:4358 free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 1 PID: 2821 Comm: Xwayland Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: 41401009 (nZcv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel: lr : kfree+0x160/0x1b0
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f15c00
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f15c00 x28: ffff0005fc360e00 x27: ffff00025a28fc00
Jul 06 17:57:04 m1pro kernel: x26: ffffffa0000c4380 x25: 00000000007c5300 x24: ffffffa600ae7000
Jul 06 17:57:04 m1pro kernel: x23: ffff00001672aa08 x22: ffffffa00000001c x21: 0000000000000001
Jul 06 17:57:04 m1pro kernel: x20: ffff000500000500 x19: ffffff7fc5000000 x18: 000000000000000c
Jul 06 17:57:04 m1pro kernel: x17: 0000000000000000 x16: 0000000000000080 x15: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x14: 0000000000000000 x13: 9393939300000000 x12: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x11: 0000000000000000 x10: 0000000000000268 x9 : 0000000000000000
Jul 06 17:57:04 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000268 x6 : ffff0005a6d3db00
Jul 06 17:57:04 m1pro kernel: x5 : ffff800093f161e0 x4 : ffff000066486d80 x3 : ffff8000994ac080
Jul 06 17:57:04 m1pro kernel: x2 : 0000000000000001 x1 : ffff000500000500 x0 : 0000000000008128
Jul 06 17:57:04 m1pro kernel: Call trace:
Jul 06 17:57:04 m1pro kernel:  free_large_kmalloc+0xac/0xe0
Jul 06 17:57:04 m1pro kernel:  kfree+0x160/0x1b0
Jul 06 17:57:04 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtNtB8_5queue6renderNtB3Q_13QueueG13V13_513submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG13V13_5B4S_EB4S_B4S_NCB3I_s1_0>
Jul 06 17:57:04 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_13QueueG13V13_513submit_render+0x16e4/0x1dd0 [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf9c/0x157c [asahi]
Jul 06 17:57:04 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 06 17:57:04 m1pro kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 06 17:57:04 m1pro kernel:  drm_ioctl+0x20c/0x4b4
Jul 06 17:57:04 m1pro kernel:  __arm64_sys_ioctl+0xac/0xf4
Jul 06 17:57:04 m1pro kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 06 17:57:04 m1pro kernel:  do_el0_svc+0x40/0xc8
Jul 06 17:57:04 m1pro kernel:  el0_svc+0x34/0xfc
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 06 17:57:04 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 06 17:57:04 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 06 17:57:04 m1pro kernel: object pointer: 0x00000000f1f7ed20
Jul 06 17:57:04 m1pro kernel: Unable to handle kernel paging request at virtual address 000109050208110a
Jul 06 17:57:04 m1pro kernel: Mem abort info:
Jul 06 17:57:04 m1pro kernel:   ESR = 0x0000000096000004
Jul 06 17:57:04 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 06 17:57:04 m1pro kernel:   SET = 0, FnV = 0
Jul 06 17:57:04 m1pro kernel:   EA = 0, S1PTW = 0
Jul 06 17:57:04 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 06 17:57:04 m1pro kernel: Data abort info:
Jul 06 17:57:04 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 06 17:57:04 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 06 17:57:04 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 06 17:57:04 m1pro kernel: [000109050208110a] address between user and kernel address ranges
Jul 06 17:57:04 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 06 17:57:04 m1pro kernel: Modules linked in: uas usb_storage xhci_plat_hcd xhci_hcd vhost_net vhost vhost_iotlb xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device bnep brcmfmac_wcc joydev hci_bcm4377 bluetooth hid_magicmouse brcmfmac brcmutil cfg80211 ecdh_generic ecc rfkill apple_isp asahi snd_soc_macaudio appledrm ofpar>
Jul 06 17:57:04 m1pro kernel:  udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart btrfs xor xor_neon raid6_pq
Jul 06 17:57:04 m1pro kernel: CPU: 2 PID: 2821 Comm: Xwayland Tainted: G S      W          6.9.7-asahi #1-NixOS
Jul 06 17:57:04 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 06 17:57:04 m1pro kernel: pstate: a1401009 (NzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 06 17:57:04 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2a4
Jul 06 17:57:04 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2a4
Jul 06 17:57:04 m1pro kernel: sp : ffff800093f15d40
Jul 06 17:57:04 m1pro kernel: x29: ffff800093f15d50 x28: 00000000ffffffa0 x27: ffff0005fc360480
Jul 06 17:57:04 m1pro kernel: x26: ffffffa00000c984 x25: 00000000000174f6 x24: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x23: f801090502080f0a x22: 00000000ffffffff x21: 0000000000000cc0
Jul 06 17:57:04 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 06 17:57:04 m1pro kernel: x17: 0000000000000000 x16: 00000000000c0000 x15: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 06 17:57:04 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 06 17:57:04 m1pro kernel: x8 : 20fa80007a43199c x7 : 0000000000000cc0 x6 : 0000000000000318

Running a video conference on zoom triggered the freeze for me the fast - it takes only a few minutes for the system to freeze.

--

Regarding the build: Jul 15 09:01:14 m1pro kernel: Linux version 6.9.9-asahi (nixbld@localhost) (gcc (GCC) 13.3.0, GNU ld (GNU Binutils) 2.42) #1-NixOS SMP PREEMPT_DYNAMIC Tue Jan 1 00:00:00 UTC 1980 My kernel command line is: initrd=\EFI\nixos\x0f7ip2w2nzvaz8ywlshqalzbr7ys0ww-initrd-linux-6.9.9-asahi-initrd.efi init=/nix/store/p10vrn8z7q2vsssff5ysg00za6wl3vaf-nixos-system-m1pro-24.11.20240709.feb2849/init earlycon console=ttySAC0,115200n8 console=tty0 boot.shell_on_fail nvme_apple.flush_interval=0 mitigations=off loglevel=4.

You could probably just follow the installation instructions here to get the exact same kernel build, chromium, wayland + gnome (well, at least thats what nix promises you): https://github.com/tpwrules/nixos-apple-silicon/blob/main/docs/uefi-standalone.md

montchr commented 1 month ago

I am also running GNOME (Wayland), and my crash yesterday (whose logs I posted in my previous comment) happened while I was scrolling through a webpage in Firefox. I generally don't use Chrome as a web browser.

I haven't seen as severe/frequent/predictable behavior as @oliverbestmann reports. I sometimes do Zoom screensharing from Firefox, and that hasn't been a problem.

montchr commented 1 month ago

I went back looking for similar crashes and found this one, possibly related, though I don't know what I was doing at the time:

Jul 05 15:38:21 tuvok kernel: Renderer: page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=user.slice,mems_allowed=0
Jul 05 15:38:21 tuvok kernel: CPU: 3 PID: 6609 Comm: Renderer Tainted: G S         O       6.8.10-asahi #1-NixOS
Jul 05 15:38:21 tuvok kernel: Hardware name: Apple MacBook Air (13-inch, M2, 2022) (DT)
Jul 05 15:38:21 tuvok kernel: Call trace:
Jul 05 15:38:21 tuvok kernel:  dump_backtrace+0x94/0x114
Jul 05 15:38:21 tuvok kernel:  show_stack+0x18/0x24
Jul 05 15:38:21 tuvok kernel:  dump_stack_lvl+0x74/0x8c
Jul 05 15:38:21 tuvok kernel:  dump_stack+0x18/0x24
Jul 05 15:38:21 tuvok kernel:  warn_alloc+0x11c/0x1a0
Jul 05 15:38:21 tuvok kernel:  __alloc_pages_slowpath.constprop.0+0x950/0x9b8
Jul 05 15:38:21 tuvok kernel:  __alloc_pages+0x204/0x28c
Jul 05 15:38:21 tuvok kernel:  __kmalloc_large_node+0x80/0x138
Jul 05 15:38:21 tuvok kernel:  __kmalloc_node_track_caller+0x220/0x2a4
Jul 05 15:38:21 tuvok kernel:  krealloc+0x7c/0xe4
Jul 05 15:38:21 tuvok kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x6c/0xac [asahi]
Jul 05 15:38:21 tuvok kernel:  _RNvXsd_NtCsirMamryJlsQ_5asahi5allocNtB5_13HeapAllocatorNtB5_9Allocator15collect_garbage+0x70/0x2fc [asahi]
Jul 05 15:38:21 tuvok kernel:  _RNvXsd_NtCsirMamryJlsQ_5asahi3gpuNtB5_18GpuManagerG14V12_4NtB5_10GpuManager5alloc+0x5bc/0x72c [asahi]
Jul 05 15:38:21 tuvok kernel:  _RNvMs0_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_13QueueG14V12_413submit_render+0x514/0x1cd0 [asahi]
Jul 05 15:38:21 tuvok kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0xfac/0x158c [asahi]
Jul 05 15:38:21 tuvok kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 05 15:38:21 tuvok kernel:  drm_ioctl_kernel+0xbc/0x128
Jul 05 15:38:21 tuvok kernel:  drm_ioctl+0x20c/0x4b4
Jul 05 15:38:21 tuvok kernel:  __arm64_sys_ioctl+0xac/0xf0
Jul 05 15:38:21 tuvok kernel:  invoke_syscall.constprop.0+0x50/0xec
Jul 05 15:38:21 tuvok kernel:  do_el0_svc+0x40/0xc8
Jul 05 15:38:21 tuvok kernel:  el0_svc+0x34/0xf8
Jul 05 15:38:21 tuvok kernel:  el0t_64_sync_handler+0x120/0x12c
Jul 05 15:38:21 tuvok kernel:  el0t_64_sync+0x190/0x194
Jul 05 15:38:21 tuvok kernel: Mem-Info:
Jul 05 15:38:21 tuvok kernel: active_anon:1077 inactive_anon:708119 isolated_anon:0
                               active_file:35776 inactive_file:46301 isolated_file:0
                               unevictable:84219 dirty:4620 writeback:0
                               slab_reclaimable:26457 slab_unreclaimable:11463
                               mapped:41466 shmem:99150 pagetables:7207
                               sec_pagetables:0 bounce:0
                               kernel_misc_reclaimable:0
                               free:56064 free_pcp:18 free_cma:141
Jul 05 15:38:21 tuvok kernel: Node 0 active_anon:17232kB inactive_anon:11329904kB active_file:572416kB inactive_file:740816kB unevictable:1347504kB isolated(anon):0kBisolated(file):0kB mapped:663456kB dirty:73920kB writeback:0kB shmem:1586400kB writeback_tmp:0kB kernel_stack:74048kB pagetables:115312kB sec_pagetables:0kB all_unreclaimable? no
Jul 05 15:38:21 tuvok kernel: DMA free:897024kB boost:32768kB min:48640kB low:64400kB high:80160kB reserved_highatomic:32768KB active_anon:17232kB inactive_anon:11329904kB active_file:572416kB inactive_file:740848kB unevictable:1347504kB writepending:72784kB present:16049696kB managed:15794112kB mlocked:384kB bounce:0kB free_pcp:288kB local_pcp:0kB free_cma:2256kB
Jul 05 15:38:21 tuvok kernel: lowmem_reserve[]: 0 0 0 0
Jul 05 15:38:21 tuvok kernel: DMA: 28987*16kB (UMEHC) 8587*32kB (UMEHC) 1922*64kB (UMEHC) 258*128kB (UMEH) 12*256kB (UMEH) 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB0*16384kB 0*32768kB = 897680kB
Jul 05 15:38:21 tuvok kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jul 05 15:38:21 tuvok kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
Jul 05 15:38:21 tuvok kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jul 05 15:38:21 tuvok kernel: 181220 total pagecache pages
Jul 05 15:38:21 tuvok kernel: 0 pages in swap cache
Jul 05 15:38:21 tuvok kernel: Free swap  = 0kB
Jul 05 15:38:21 tuvok kernel: Total swap = 0kB
Jul 05 15:38:21 tuvok kernel: 1003106 pages RAM
Jul 05 15:38:21 tuvok kernel: 0 pages HighMem/MovableOnly
Jul 05 15:38:21 tuvok kernel: 15974 pages reserved
Jul 05 15:38:21 tuvok kernel: 4096 pages cma reserved
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
Jul 05 15:38:21 tuvok kernel: asahi 206400000.gpu: HeapAllocator[Kernel Private]:collect_garbage: failed to reserve space
mkurz commented 1 month ago

I am on KDE Plasma and don't even have Gnome installed... I use Firefox not Chromium...

jannau commented 1 month ago

While trying to reproduce this I encountered the following:

Jul 19 00:00:28 nixos kernel: asahi 206400000.gpu: FWLog: ERROR: PIO poll from agfPollFenderReg timeout after 290us [type:0 reg:0x10080 expected:0x0 got:0x0 max:250us], continue wait
Jul 19 00:00:28 nixos kernel: asahi 206400000.gpu: FWLog: PIO poll from agfPollFenderReg finally succeeded after 333us [type:0 reg:0x10080 value:0x0 max:250us]

and a reboot after maybe 30 second. Those were the two last and only interesting lines in journalctl -kf running in a ssh session. Audio from a playing youtube video as looping (i.e. replaying the last buffer over and over). Load was console, shadertoy.com/view/X3ySRc / youtube in chromium and webglsamples (dynamic cubemap) in gnome's (wayland) overview effect showing 4 windows. System was up for ~40 minutes at that point.

Nevermind, might not be related. The kernel log survived and has the usual error 9 minutes later:

Jul 19 00:00:28 nixos kernel: asahi 206400000.gpu: FWLog: ERROR: PIO poll from agfPollFenderReg timeout after 290us [type:0 reg:0x10080 expected:0x0 got:0x0 max:250us], continue wait
Jul 19 00:00:28 nixos kernel: asahi 206400000.gpu: FWLog: PIO poll from agfPollFenderReg finally succeeded after 333us [type:0 reg:0x10080 value:0x0 max:250us]
Jul 19 00:09:24 nixos kernel: ------------[ cut here ]------------
Jul 19 00:09:24 nixos kernel: asahi 206400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 19 00:09:24 nixos kernel: WARNING: CPU: 1 PID: 4912 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 19 00:09:24 nixos kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr bnep brcmfmac_wcc cdc_mbim cdc_wdm hci_bcm4377 brcmfmac bluetooth brcmutil cfg80211 ecdh_generic hid_magicmouse joydev ecc btrfs xor xor_neon rfkill cdc_ncm panel_summit cdc_ether usbnet apple_isp appledrm mii snd_soc_macaudio macsmc_reboot macsmc_hid macsmc_power macsmc_hwmon videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 ofpart videodev raid6_pq apple_dcp spi_nor asahi videobuf2_common adpdrm clk_apple_nco pwm_apple mc mux_core snd_soc_apple_mca apple_z2 snd_soc_tas2764 snd_soc_cs42l84 apple_admac drm_dma_helper apple_soc_cpufreq leds_pwm hid_apple loop tun tap macvlan bridge stp llc fuse nfnetlink ip_tables uas usb_storage xhci_plat_hcd xhci_hcd nvmem_spmi_mfd rtc_macsmc gpio_macsmc tps6598x simple_mfd_spmi regmap_spmi dockchannel_hid pcie_apple dwc3 nvme_apple macsmc_rtkit pci_host_common macsmc phy_apple_atc udc_core typec apple_dockchannel apple_rtkit_helper spmi_apple_controller apple_sart mfd_core nvmem_apple_efuses
Jul 19 00:09:24 nixos kernel:  pinctrl_apple_gpio spi_apple i2c_pasemi_platform i2c_pasemi_core apple_dart
Jul 19 00:09:24 nixos kernel: CPU: 1 PID: 4912 Comm: Renderer Tainted: G S                 6.9.9-asahi #1-NixOS
Jul 19 00:09:24 nixos kernel: Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
Jul 19 00:09:24 nixos kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 19 00:09:24 nixos kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 19 00:09:24 nixos kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 19 00:09:24 nixos kernel: sp : ffff8000b3f27440
Jul 19 00:09:24 nixos kernel: x29: ffff8000b3f27440 x28: 0000000000000030 x27: ffff00000c291000
Jul 19 00:09:24 nixos kernel: x26: ffff80007a1d1910 x25: 0000000000000000 x24: ffff0000d5e84d80
Jul 19 00:09:24 nixos kernel: x23: ffff8000b3f27888 x22: ffff00009e415038 x21: ffff00008b13edd8
Jul 19 00:09:24 nixos kernel: x20: ffff0000f0e6e208 x19: ffff0000f0e6e208 x18: 0000000000000000
Jul 19 00:09:24 nixos kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 19 00:09:24 nixos kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 19 00:09:24 nixos kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 19 00:09:24 nixos kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 19 00:09:24 nixos kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 19 00:09:24 nixos kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 19 00:09:24 nixos kernel: Call trace:
Jul 19 00:09:24 nixos kernel:  drm_sched_can_queue+0x110/0x168
Jul 19 00:09:24 nixos kernel:  drm_sched_wakeup+0x18/0x7c
Jul 19 00:09:24 nixos kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 19 00:09:24 nixos kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 19 00:09:24 nixos kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 19 00:09:24 nixos kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 19 00:09:24 nixos kernel:  drm_ioctl+0x23c/0x4e4
Jul 19 00:09:24 nixos kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 19 00:09:24 nixos kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 19 00:09:24 nixos kernel:  do_el0_svc+0x40/0xf0
Jul 19 00:09:24 nixos kernel:  el0_svc+0x34/0x11c
Jul 19 00:09:24 nixos kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 19 00:09:24 nixos kernel:  el0t_64_sync+0x190/0x194
Jul 19 00:09:24 nixos kernel: ---[ end trace 0000000000000000 ]---
Jul 19 00:09:24 nixos kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 19 00:09:24 nixos kernel: Mem abort info:
Jul 19 00:09:25 nixos kernel:   ESR = 0x0000000096000007
Jul 19 00:09:25 nixos kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 19 00:09:25 nixos kernel:   SET = 0, FnV = 0
Jul 19 00:09:25 nixos kernel:   EA = 0, S1PTW = 0
Jul 19 00:09:25 nixos kernel:   FSC = 0x07: level 3 translation fault
Jul 19 00:09:25 nixos kernel: Data abort info:
Jul 19 00:09:25 nixos kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 19 00:09:25 nixos kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 19 00:09:25 nixos kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 19 00:09:25 nixos kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000009d1f30000
Jul 19 00:09:25 nixos kernel: [ffff000000000700] pgd=18000009db174003, p4d=18000009db174003, pud=18000009db170003, pmd=18000009db16c003, pte=0000000000000000
Jul 19 00:09:25 nixos kernel: Internal error: Oops: 0000000096000007 [#1] PREEMPT SMP
Jul 19 00:09:25 nixos kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr bnep brcmfmac_wcc cdc_mbim cdc_wdm hci_bcm4377 brcmfmac bluetooth brcmutil cfg80211 ecdh_generic hid_magicmouse joydev ecc btrfs xor xor_neon rfkill cdc_ncm panel_summit cdc_ether usbnet apple_isp appledrm mii snd_soc_macaudio macsmc_reboot macsmc_hid macsmc_power macsmc_hwmon videobuf2_dma_sg videobuf2_memops videobuf2_v4l2 ofpart videodev raid6_pq apple_dcp spi_nor asahi videobuf2_common adpdrm clk_apple_nco pwm_apple mc mux_core snd_soc_apple_mca apple_z2 snd_soc_tas2764 snd_soc_cs42l84 apple_admac drm_dma_helper apple_soc_cpufreq leds_pwm hid_apple loop tun tap macvlan bridge stp llc fuse nfnetlink ip_tables uas usb_storage xhci_plat_hcd xhci_hcd nvmem_spmi_mfd rtc_macsmc gpio_macsmc tps6598x simple_mfd_spmi regmap_spmi dockchannel_hid pcie_apple dwc3 nvme_apple macsmc_rtkit pci_host_common macsmc phy_apple_atc udc_core typec apple_dockchannel apple_rtkit_helper spmi_apple_controller apple_sart mfd_core nvmem_apple_efuses
Jul 19 00:09:25 nixos kernel:  pinctrl_apple_gpio spi_apple i2c_pasemi_platform i2c_pasemi_core apple_dart
Jul 19 00:09:25 nixos kernel: CPU: 4 PID: 4912 Comm: Renderer Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 19 00:09:25 nixos kernel: Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
Jul 19 00:09:25 nixos kernel: pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 19 00:09:25 nixos kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 19 00:09:25 nixos kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 19 00:09:25 nixos kernel: sp : ffff8000b3f25cf0
Jul 19 00:09:25 nixos kernel: x29: ffff8000b3f25d00 x28: 0000000000480f9d x27: ffff0000f155a980
Jul 19 00:09:25 nixos kernel: x26: 00000000ffd14000 x25: 00000000ffffffa0 x24: 0000000000000000
Jul 19 00:09:25 nixos kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 19 00:09:25 nixos kernel: x20: ffff000001f5cb00 x19: 0000000000000328 x18: 00000000000000ff
Jul 19 00:09:25 nixos kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 19 00:09:25 nixos kernel: x14: 0000000000000000 x13: 0000000100000000 x12: 0000000000000000
Jul 19 00:09:25 nixos kernel: x11: 0000000000000001 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 19 00:09:25 nixos kernel: x8 : 8ef680007a0d19c4 x7 : 0000000000000cc0 x6 : 0000000000000328
Jul 19 00:09:25 nixos kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000b94d801
Jul 19 00:09:25 nixos kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff000001f5cb00
Jul 19 00:09:25 nixos kernel: Call trace:
Jul 19 00:09:25 nixos kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 19 00:09:25 nixos kernel:  krealloc+0x9c/0x144
Jul 19 00:09:25 nixos kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 19 00:09:25 nixos kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 19 00:09:25 nixos kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG14V12_4INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs0_NtNtB8_5queue6renderNtB3Q_18QueueInnerG14V12_413submit_renders1_0s_0BP_NtNtB2O_5error5ErrorEIB2I_NCNCB3I_s2_0s_0NtNtBR_3raw19RunFragmentG14V12_4B4X_EB4X_B4X_NCB3I_s1_0NCB3I_s2_0EB8_+0x800/0x1ea8 [asahi]
Jul 19 00:09:25 nixos kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 19 00:09:25 nixos kernel:  _RNvMs0_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG14V12_413submit_render+0x162c/0x1cd0 [asahi]
Jul 19 00:09:25 nixos kernel:  _RNvXsJ_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 19 00:09:25 nixos kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 19 00:09:25 nixos kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 19 00:09:25 nixos kernel:  drm_ioctl+0x23c/0x4e4
Jul 19 00:09:25 nixos kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 19 00:09:25 nixos kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 19 00:09:25 nixos kernel: Mem abort info:
Jul 19 00:09:25 nixos kernel:  do_el0_svc+0x40/0xf0
Jul 19 00:09:25 nixos kernel:  el0_svc+0x34/0x11c
Jul 19 00:09:25 nixos kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 19 00:09:25 nixos kernel:  el0t_64_sync+0x190/0x194
Jul 19 00:09:25 nixos kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 19 00:09:25 nixos kernel:   ESR = 0x0000000096000007
Jul 19 00:09:25 nixos kernel: ---[ end trace 0000000000000000 ]---
Jul 19 00:09:25 nixos kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 19 00:09:25 nixos kernel:   SET = 0, FnV = 0
Jul 19 00:09:25 nixos kernel:   EA = 0, S1PTW = 0
Jul 19 00:09:25 nixos kernel:   FSC = 0x07: level 3 translation fault
Jul 19 00:09:25 nixos kernel: Data abort info:
Jul 19 00:09:25 nixos kernel:   ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
Jul 19 00:09:25 nixos kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 19 00:09:25 nixos kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 19 00:09:25 nixos kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=00000009d1f30000
Jul 19 00:09:25 nixos kernel: [ffff000000000700] pgd=18000009db174003, p4d=18000009db174003, pud=18000009db170003, pmd=18000009db16c003, pte=0000000000000000
Jul 19 00:09:25 nixos kernel: Internal error: Oops: 0000000096000007 [#2] PREEMPT SMP
tpwrules commented 1 month ago

I got this error just surfing with Firefox using Plasma on Wayland and a few minutes of uptime (NixOS):

Jul 18 20:51:29 tpw-nixosbp kernel: ------------[ cut here ]------------
Jul 18 20:51:29 tpw-nixosbp kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 18 20:51:29 tpw-nixosbp kernel: WARNING: CPU: 0 PID: 1784 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 18 20:51:29 tpw-nixosbp kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil ecdh_generic cfg80211 ecc o>
Jul 18 20:51:29 tpw-nixosbp kernel:  nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart
Jul 18 20:51:29 tpw-nixosbp kernel: CPU: 0 PID: 1784 Comm: Renderer Tainted: G S                 6.9.9-asahi #1-NixOS
Jul 18 20:51:29 tpw-nixosbp kernel: Hardware name: Apple MacBook Pro (16-inch, M1 Max, 2021) (DT)
Jul 18 20:51:29 tpw-nixosbp kernel: pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 18 20:51:29 tpw-nixosbp kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 18 20:51:29 tpw-nixosbp kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 18 20:51:29 tpw-nixosbp kernel: sp : ffff800089507440
Jul 18 20:51:29 tpw-nixosbp kernel: x29: ffff800089507440 x28: 0000000000000030 x27: ffff000010811000
Jul 18 20:51:29 tpw-nixosbp kernel: x26: ffff80007a409818 x25: 0000000000000000 x24: ffff0000c014a340
Jul 18 20:51:29 tpw-nixosbp kernel: x23: ffff800089507888 x22: ffff00007985b138 x21: ffff0000786be9d8
Jul 18 20:51:29 tpw-nixosbp kernel: x20: ffff0000b965cc08 x19: ffff0000b965cc08 x18: 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: x11: 0000000000000000 x10: 5aa307fa777d634d x9 : 815d5224bd8b64ad
Jul 18 20:51:29 tpw-nixosbp kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 18 20:51:29 tpw-nixosbp kernel: Call trace:
Jul 18 20:51:29 tpw-nixosbp kernel:  drm_sched_can_queue+0x110/0x168
Jul 18 20:51:29 tpw-nixosbp kernel:  drm_sched_wakeup+0x18/0x7c
Jul 18 20:51:29 tpw-nixosbp kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 18 20:51:29 tpw-nixosbp kernel:  _RNvXsI_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 18 20:51:29 tpw-nixosbp kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 18 20:51:29 tpw-nixosbp kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 18 20:51:29 tpw-nixosbp kernel:  drm_ioctl+0x23c/0x4e4
Jul 18 20:51:29 tpw-nixosbp kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 18 20:51:29 tpw-nixosbp kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 18 20:51:29 tpw-nixosbp kernel:  do_el0_svc+0x40/0xf0
Jul 18 20:51:29 tpw-nixosbp kernel:  el0_svc+0x34/0x11c
Jul 18 20:51:29 tpw-nixosbp kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 18 20:51:29 tpw-nixosbp kernel:  el0t_64_sync+0x190/0x194
Jul 18 20:51:29 tpw-nixosbp kernel: ---[ end trace 0000000000000000 ]---
Jul 18 20:51:29 tpw-nixosbp kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 18 20:51:29 tpw-nixosbp kernel: Mem abort info:
Jul 18 20:51:30 tpw-nixosbp kernel:   ESR = 0x0000000096000006
Jul 18 20:51:30 tpw-nixosbp kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 18 20:51:30 tpw-nixosbp kernel:   SET = 0, FnV = 0
Jul 18 20:51:30 tpw-nixosbp kernel:   EA = 0, S1PTW = 0
Jul 18 20:51:30 tpw-nixosbp kernel:   FSC = 0x06: level 2 translation fault
Jul 18 20:51:30 tpw-nixosbp kernel: Data abort info:
Jul 18 20:51:30 tpw-nixosbp kernel:   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
Jul 18 20:51:30 tpw-nixosbp kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 18 20:51:30 tpw-nixosbp kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 18 20:51:30 tpw-nixosbp kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000010fc1d30000
Jul 18 20:51:30 tpw-nixosbp kernel: [ffff000000000700] pgd=1800010fcb000003, p4d=1800010fcb000003, pud=1800010fcaffc003, pmd=0000000000000000
Jul 18 20:51:30 tpw-nixosbp kernel: Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Jul 18 20:51:30 tpw-nixosbp kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device qrtr brcmfmac_wcc joydev hid_magicmouse hci_bcm4377 bluetooth brcmfmac brcmutil ecdh_generic cfg80211 ecc o>
Jul 18 20:51:30 tpw-nixosbp kernel:  nvmem_apple_efuses macsmc_rtkit macsmc pinctrl_apple_gpio spmi_apple_controller mfd_core phy_apple_atc typec apple_dart
Jul 18 20:51:30 tpw-nixosbp kernel: CPU: 9 PID: 1784 Comm: Renderer Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 18 20:51:30 tpw-nixosbp kernel: Hardware name: Apple MacBook Pro (16-inch, M1 Max, 2021) (DT)
Jul 18 20:51:30 tpw-nixosbp kernel: pstate: a1400009 (NzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
Jul 18 20:51:30 tpw-nixosbp kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 18 20:51:30 tpw-nixosbp kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 18 20:51:30 tpw-nixosbp kernel: sp : ffff800089505cd0
Jul 18 20:51:30 tpw-nixosbp kernel: x29: ffff800089505ce0 x28: 0000000000017e90 x27: ffff000053e8db80
Jul 18 20:51:30 tpw-nixosbp kernel: x26: 00000000f855c000 x25: 00000000ffffffa0 x24: 0000000000000000
Jul 18 20:51:30 tpw-nixosbp kernel: x23: ffff000000000500 x22: 00000000ffffffff x21: 0000000000000cc0
Jul 18 20:51:30 tpw-nixosbp kernel: x20: ffff00000c004b00 x19: 0000000000000328 x18: 00000000000000ff
Jul 18 20:51:30 tpw-nixosbp kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 18 20:51:30 tpw-nixosbp kernel: x14: 0000000000000000 x13: 0000000100000000 x12: 0000000000000000
Jul 18 20:51:30 tpw-nixosbp kernel: x11: 0000000000000001 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 18 20:51:30 tpw-nixosbp kernel: x8 : c5aa80007a3099c4 x7 : 0000000000000cc0 x6 : 0000000000000328
Jul 18 20:51:30 tpw-nixosbp kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000004f8b80
Jul 18 20:51:30 tpw-nixosbp kernel: x2 : 0000000000000200 x1 : ffff000000000500 x0 : ffff00000c004b00
Jul 18 20:51:30 tpw-nixosbp kernel: Call trace:
Jul 18 20:51:30 tpw-nixosbp kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 18 20:51:30 tpw-nixosbp kernel:  krealloc+0x9c/0x144
Jul 18 20:51:30 tpw-nixosbp kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 18 20:51:30 tpw-nixosbp kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 18 20:51:30 tpw-nixosbp kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw8fragment19RunFragmentG13V12_3INtNtB8_5alloc12GenericAllocBP_NtB1y_14HeapAllocationEE17new_init_preallocINtN>
Jul 18 20:51:30 tpw-nixosbp kernel: Unable to handle kernel paging request at virtual address ffff000000000700
Jul 18 20:51:30 tpw-nixosbp kernel:  _RNvMs_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB6_18QueueInnerG13V12_313submit_render+0x1650/0x1d28 [asahi]
Jul 18 20:51:30 tpw-nixosbp kernel:  _RNvXsI_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V12_3NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 18 20:51:30 tpw-nixosbp kernel: Mem abort info:
Jul 18 20:51:30 tpw-nixosbp kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 18 20:51:30 tpw-nixosbp kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 18 20:51:30 tpw-nixosbp kernel:   ESR = 0x0000000096000006
Jul 18 20:51:30 tpw-nixosbp kernel:  drm_ioctl+0x23c/0x4e4
Jul 18 20:51:30 tpw-nixosbp kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 18 20:51:30 tpw-nixosbp kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 18 20:51:30 tpw-nixosbp kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 18 20:51:30 tpw-nixosbp kernel:  do_el0_svc+0x40/0xf0
Jul 18 20:51:30 tpw-nixosbp kernel:  el0_svc+0x34/0x11c
Jul 18 20:51:30 tpw-nixosbp kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 18 20:51:30 tpw-nixosbp kernel:   SET = 0, FnV = 0
Jul 18 20:51:30 tpw-nixosbp kernel:  el0t_64_sync+0x190/0x194
Jul 18 20:51:30 tpw-nixosbp kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 18 20:51:30 tpw-nixosbp kernel: ---[ end trace 0000000000000000 ]---
Jul 18 20:51:30 tpw-nixosbp kernel:   EA = 0, S1PTW = 0
Jul 18 20:51:30 tpw-nixosbp kernel:   FSC = 0x06: level 2 translation fault
Jul 18 20:51:30 tpw-nixosbp kernel: Data abort info:
Jul 18 20:51:30 tpw-nixosbp kernel:   ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
Jul 18 20:51:30 tpw-nixosbp kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 18 20:51:30 tpw-nixosbp kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 18 20:51:30 tpw-nixosbp kernel: swapper pgtable: 16k pages, 48-bit VAs, pgdp=0000010fc1d30000
Jul 18 20:51:30 tpw-nixosbp kernel: [ffff000000000700] pgd=1800010fcb000003, p4d=1800010fcb000003, pud=1800010fcaffc003, pmd=0000000000000000
Jul 18 20:51:30 tpw-nixosbp kernel: Internal error: Oops: 0000000096000006 [#2] PREEMPT SMP

I may be able to go on a bisection adventure this weekend if more is not turned up.

cjdell commented 1 month ago

Just happened to me now on my M1 Air although the system didn't finish writing the full crash report. Happened on first boot with new 6.9.9 kernel with about 3 minutes uptime upon loading Firefox with a bunch of tabs being restored. Interestingly hasn't happened again (yet) with 30 minutes uptime. Only thing different is I didn't restore my old tabs this time.

UPDATE: Ran for several hours and than crashed again as soon as I opened VS Code (an Electron app). Interestingly I have built a kernel without crashing so it doesn't appear unstable from a compute perspective.

------------[ cut here ]------------
asahi 206400000.gpu: Jobs may not exceed the credit limit, truncate.
WARNING: CPU: 0 PID: 1895 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Modules linked in: qrtr snd_seq_dummy snd_hrtimer snd_seq snd_seq_device des_generic libdes md4 brcmfmac_wcc hci_bcm4377 brcmfmac bluetooth brcmutil cfg80211 ecdh_generic ecc rfkill joydev hid_magicmouse macsmc_reboot appledrm macsmc_power macsmc_hwmon macsmc_hid snd_soc_macaudio ofpart apple_isp snd_soc_cs42l83_i2c spi_nor snd_soc_cs42l42 snd_soc_tas2770 videobuf2_dma_sg xt_conntrack videobuf2_memops videobuf2_v4l2 videodev clk_apple_nco asahi apple_dcp apple_admac snd_soc_apple_mca nf_conntrack videobuf2_common mc mux_core pwm_apple drm_dma_helper leds_pwm apple_soc_cpufreq nf_defrag_ipv6 nf_defrag_ipv4 hid_apple ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog nft_compat nf_tables loop tun tap macvlan bridge stp llc fuse nfnetlink ip_tables nvmem_spmi_mfd gpio_macsmc rtc_macsmc spi_hid_apple_of tps6598x spi_hid_apple simple_mfd_spmi regmap_spmi phy_apple_atc pcie_apple typec pci_host_common dwc3 nvme_apple udc_core apple_sart macsmc_rtkit nvmem_apple_efuses macsmc mfd_core
 spmi_apple_controller pinctrl_apple_gpio spi_apple i2c_pasemi_platform i2c_pasemi_core apple_dart
CPU: 0 PID: 1895 Comm: Renderer Tainted: G S                 6.9.9-asahi #1-NixOS
Hardware name: Apple MacBook Air (M1, 2020) (DT)
pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : drm_sched_can_queue+0x110/0x168
lr : drm_sched_can_queue+0x110/0x168
sp : ffff80008c357440
x29: ffff80008c357440 x28: 0000000000000030 x27: ffff00000b6ec000
x26: ffff80007a1a5948 x25: 0000000000000000 x24: ffff00001f83fe00
x23: ffff80008c357888 x22: ffff00001f83fd38 x21: ffff0000b5886dd8
x20: ffff000048a08c08 x19: ffff000048a08c08 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 drm_sched_can_queue+0x110/0x168
 drm_sched_wakeup+0x18/0x7c
 drm_sched_entity_push_job+0x174/0x1e8
 _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
 _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
 drm_ioctl_kernel+0xd4/0x13c
 drm_ioctl+0x23c/0x4e4
 __arm64_sys_ioctl+0xc0/0x118
 invoke_syscall.constprop.0+0x50/0x124
 do_el0_svc+0x40/0xf0
 el0_svc+0x34/0x11c
 el0t_64_sync_handler+0x140/0x14c
 el0t_64_sync+0x190/0x194
---[ end trace 0000000000000000 ]---
Unable to handle kernel paging request at virtual address ffff000000000700
Mem abort info:
  ESR = 0x0000000096000007
  EC = 0x25: DABT (current EL), IL = 32 bits
jannau commented 1 month ago

kasan hit in a kernel with CONFIG_PREEMPT=y:

Jul 19 13:32:15 mbp13m2f kernel: ==================================================================
Jul 19 13:32:15 mbp13m2f kernel: ==================================================================
Jul 19 13:32:15 mbp13m2f kernel: BUG: KASAN: slab-use-after-free in drm_sched_can_queue+0x3c8/0x5b0
Jul 19 13:32:15 mbp13m2f kernel: Read of size 4 at addr ffff9bad96180a00 by task chromium-browse/2714
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: CPU: 0 PID: 2714 Comm: chromium-browse Not tainted 6.9.9-asahi+ #asahi-dev
Jul 19 13:32:15 mbp13m2f kernel: Hardware name: Apple MacBook Pro (13-inch, M2, 2022) (DT)
Jul 19 13:32:15 mbp13m2f kernel: Call trace:
Jul 19 13:32:15 mbp13m2f kernel:  dump_backtrace+0xdc/0x140
Jul 19 13:32:15 mbp13m2f kernel:  show_stack+0x20/0x40
Jul 19 13:32:15 mbp13m2f kernel:  dump_stack_lvl+0x60/0x80
Jul 19 13:32:15 mbp13m2f kernel:  print_address_description.constprop.0+0x90/0x320
Jul 19 13:32:15 mbp13m2f kernel:  print_report+0x108/0x1f8
Jul 19 13:32:15 mbp13m2f kernel:  kasan_report+0xb0/0x170
Jul 19 13:32:15 mbp13m2f kernel:  __asan_report_load4_noabort+0x20/0x30
Jul 19 13:32:15 mbp13m2f kernel:  drm_sched_can_queue+0x3c8/0x5b0
Jul 19 13:32:15 mbp13m2f kernel:  drm_sched_wakeup+0x20/0xd8
Jul 19 13:32:15 mbp13m2f kernel:  drm_sched_entity_push_job+0x330/0x440
Jul 19 13:32:15 mbp13m2f kernel:  _RNvXsJ_NtCslwmUlBZSpxX_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0x12a8/0x1548 [asahi]
Jul 19 13:32:15 mbp13m2f kernel:  _RNvNvXs_NtCslwmUlBZSpxX_5asahi6driverNtB6_11AsahiDriverNtNtNtCsDNbLCochMk_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x644/0x830 [asahi]
Jul 19 13:32:15 mbp13m2f kernel:  drm_ioctl_kernel+0x168/0x2d8
Jul 19 13:32:15 mbp13m2f kernel:  drm_ioctl+0x4bc/0x9c0
Jul 19 13:32:15 mbp13m2f kernel:  __arm64_sys_ioctl+0x12c/0x220
Jul 19 13:32:15 mbp13m2f kernel:  invoke_syscall.constprop.0+0xe0/0x1e8
Jul 19 13:32:15 mbp13m2f kernel:  do_el0_svc+0xcc/0x1d0
Jul 19 13:32:15 mbp13m2f kernel:  el0_svc+0x40/0xe8
Jul 19 13:32:15 mbp13m2f kernel:  el0t_64_sync_handler+0x120/0x130
Jul 19 13:32:15 mbp13m2f kernel:  el0t_64_sync+0x194/0x198
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: Allocated by task 2714:
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_stack+0x3c/0x70
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_track+0x20/0x40
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_alloc_info+0x40/0x60
Jul 19 13:32:15 mbp13m2f kernel:  __kasan_kmalloc+0xd4/0xe0
Jul 19 13:32:15 mbp13m2f kernel:  __kmalloc_node_track_caller+0x198/0x3d8
Jul 19 13:32:15 mbp13m2f kernel:  krealloc+0x84/0x180
Jul 19 13:32:15 mbp13m2f kernel:  _RNvMsb_NtNtCsDNbLCochMk_6kernel3drm5schedINtB5_6EntityNtNtCslwmUlBZSpxX_5asahi5queue16QueueJobG14V12_4E7new_jobBU_+0x2c/0xc0 [asahi]
Jul 19 13:32:15 mbp13m2f kernel:  _RNvXsJ_NtCslwmUlBZSpxX_5asahi5queueNtB5_13QueueG14V12_4NtB5_5Queue6submit+0x8c0/0x1548 [asahi]
Jul 19 13:32:15 mbp13m2f kernel:  _RNvNvXs_NtCslwmUlBZSpxX_5asahi6driverNtB6_11AsahiDriverNtNtNtCsDNbLCochMk_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x644/0x830 [asahi]
Jul 19 13:32:15 mbp13m2f kernel:  drm_ioctl_kernel+0x168/0x2d8
Jul 19 13:32:15 mbp13m2f kernel:  drm_ioctl+0x4bc/0x9c0
Jul 19 13:32:15 mbp13m2f kernel:  __arm64_sys_ioctl+0x12c/0x220
Jul 19 13:32:15 mbp13m2f kernel:  invoke_syscall.constprop.0+0xe0/0x1e8
Jul 19 13:32:15 mbp13m2f kernel:  do_el0_svc+0xcc/0x1d0
Jul 19 13:32:15 mbp13m2f kernel:  el0_svc+0x40/0xe8
Jul 19 13:32:15 mbp13m2f kernel:  el0t_64_sync_handler+0x120/0x130
Jul 19 13:32:15 mbp13m2f kernel:  el0t_64_sync+0x194/0x198
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: Freed by task 2444:
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_stack+0x3c/0x70
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_track+0x20/0x40
Jul 19 13:32:15 mbp13m2f kernel:  kasan_save_free_info+0x4c/0x80
Jul 19 13:32:15 mbp13m2f kernel:  __kasan_slab_free+0x108/0x170
Jul 19 13:32:15 mbp13m2f kernel:  kfree+0xe0/0x320
Jul 19 13:32:15 mbp13m2f kernel:  drm_sched_free_job_work+0xc0/0x2d8
Jul 19 13:32:15 mbp13m2f kernel:  process_one_work+0x564/0x1068
Jul 19 13:32:15 mbp13m2f kernel:  worker_thread+0x4bc/0xcc8
Jul 19 13:32:15 mbp13m2f kernel:  kthread+0x27c/0x300
Jul 19 13:32:15 mbp13m2f kernel:  ret_from_fork+0x10/0x20
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: The buggy address belongs to the object at ffff9bad96180800
                                  which belongs to the cache kmalloc-1k of size 1024
Jul 19 13:32:15 mbp13m2f kernel: The buggy address is located 512 bytes inside of
                                  freed 1024-byte region [ffff9bad96180800, ffff9bad96180c00)
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: The buggy address belongs to the physical page:
Jul 19 13:32:15 mbp13m2f kernel: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x255860
Jul 19 13:32:15 mbp13m2f kernel: head: order:2 entire_mapcount:0 nr_pages_mapped:0 pincount:0
Jul 19 13:32:15 mbp13m2f kernel: flags: 0x840(slab|head|zone=0)
Jul 19 13:32:15 mbp13m2f kernel: page_type: 0xffffffff()
Jul 19 13:32:15 mbp13m2f kernel: raw: 0000000000000840 ffff9bac6da04dc0 dead000000000100 dead000000000122
Jul 19 13:32:15 mbp13m2f kernel: raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
Jul 19 13:32:15 mbp13m2f kernel: head: 0000000000000840 ffff9bac6da04dc0 dead000000000100 dead000000000122
Jul 19 13:32:15 mbp13m2f kernel: head: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
Jul 19 13:32:15 mbp13m2f kernel: head: 0000000000000002 ffffffdb6d961801 dead000000000122 00000000ffffffff
Jul 19 13:32:15 mbp13m2f kernel: head: 0000000400000000 0000000000000000 00000000ffffffff 0000000000000000
Jul 19 13:32:15 mbp13m2f kernel: page dumped because: kasan: bad access detected
Jul 19 13:32:15 mbp13m2f kernel: 
Jul 19 13:32:15 mbp13m2f kernel: Memory state around the buggy address:
Jul 19 13:32:15 mbp13m2f kernel:  ffff9bad96180900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 19 13:32:15 mbp13m2f kernel:  ffff9bad96180980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 19 13:32:15 mbp13m2f kernel: >ffff9bad96180a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 19 13:32:15 mbp13m2f kernel:                    ^
Jul 19 13:32:15 mbp13m2f kernel:  ffff9bad96180a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 19 13:32:15 mbp13m2f kernel:  ffff9bad96180b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
Jul 19 13:32:15 mbp13m2f kernel: ==================================================================
Jul 19 13:32:15 mbp13m2f kernel: Disabling lock debugging due to kernel taint
Jul 19 13:34:39 mbp13m2f kernel: asahi 206400000.gpu: FWLog: ERROR: PIO poll from agfPollFenderReg timeout after 439us [type:0 reg:0x10080 expected:0x0 got:0x0 max:250us], continue wait
Jul 19 13:34:39 mbp13m2f kernel: asahi 206400000.gpu: FWLog: PIO poll from agfPollFenderReg finally succeeded after 679us [type:0 reg:0x10080 value:0x0 max:250us]
asahilina commented 1 month ago

Well... that explains everything.

That means this is another bug in drm_scheduler, and it has nothing to do with the GPUVM changes or our driver. It affects every GPU driver using drm_scheduler.

The bug is that it is possible for an entity to run out of jobs to run, but be about to execute a new iteration of the job work function (which would stop executing only after seeing the queue empty during an iteration). Then a new job is queued, and it's the "first" job (since the queue was empty), so drm_sched_entity_push_job() tries to wake up the workqueue. That peeks the pointer to the job in the queue in drm_sched_can_queue(). But then the already executing workqueue sees the job, pops it, runs it, and it runs to completion all the way in the GPU, and gets freed. And then the can_queue code tries to check the credits and accesses freed memory.

It's a really crazy race, since a whole GPU job needs to be dequeued, and run to completion, and be freed, involving a huge amount of driver and firmware code, all during a few instructions in drm_sched_can_queue(). So I'm not surprised it depends on preemption behavior. It can only realistically happen if the drm_sched_can_queue() thread (which is the userspace thread submitting the GPU ioctl) gets preempted exactly at that point.

I suspect this might be a regression introduced when the drm_scheduler was converted to workqueues recently (instead of kthreads).

@jannau I pushed a silly hack to disable that whole mechanism (we don't need it) directly to asahi-wip, can you please test/tag and push it to the bits branch? I'll try to fix this properly later ^^ (please do not use that commit with other non-Asahi GPU drivers, they may rely on that functionality)

jannau commented 1 month ago

I've pushed asahi-6.9.9-7 with that workaround

oliverbestmann commented 1 month ago

Thank you. I've updated my system using @cjdell's pull request and will report back in case it crashes again.

oliverbestmann commented 1 month ago

Works perfectly fine so far. Thanks for looking into the issue and for the quick workaround!

cjdell commented 1 month ago

Can also confirm stability with 50+ hours uptime. Love your hard work on this project. No plans on going back to macOS. 🙂

mkurz commented 1 month ago

Also testing asahi-6.9.9-7 on ALARM and so far looks good. Thanks!

mkurz commented 1 month ago

Actually title should be changed from gpu related crashes with kernel >= 6.9.6 to gpu related crashes with kernel >= 6.9.7 IMHO

asahilina commented 1 month ago

The bug actually affects all of 6.9.x and probably a few earlier versions too, it's just a coincidence that it apparently only manifested starting with 6.9.7.

On July 23, 2024 9:37:05 AM GMT+02:00, Matthias Kurz @.***> wrote:

Actually title should be changed from gpu related crashes with kernel >= 6.9.6 to gpu related crashes with kernel >= 6.9.7 IMHO

-- Reply to this email directly or view it on GitHub: https://github.com/AsahiLinux/linux/issues/309#issuecomment-2244475637 You are receiving this because you were mentioned.

Message ID: @.***>

oliverbestmann commented 1 month ago

I've renamed it anyways, as it was a pretty consistent coincidence.

robclark commented 5 days ago

I suspect this might be a regression introduced when the drm_scheduler was converted to workqueues recently (instead of kthreads).

looks like the regression was introduced in:

commit a78422e9dff366b3a46ae44caf6ec8ded9c9fc2f
Author:     Danilo Krummrich <dakr@redhat.com>
AuthorDate: Fri Nov 10 01:16:33 2023 +0100
Commit:     Danilo Krummrich <dakr@redhat.com>
CommitDate: Fri Nov 10 02:54:29 2023 +0100

    drm/sched: implement dynamic job-flow control

    Currently, job flow control is implemented simply by limiting the number
    of jobs in flight. Therefore, a scheduler is initialized with a credit
    limit that corresponds to the number of jobs which can be sent to the
    hardware.

    This implies that for each job, drivers need to account for the maximum
    job size possible in order to not overflow the ring buffer.

    However, there are drivers, such as Nouveau, where the job size has a
    rather large range. For such drivers it can easily happen that job
    submissions not even filling the ring by 1% can block subsequent
    submissions, which, in the worst case, can lead to the ring run dry.

    In order to overcome this issue, allow for tracking the actual job size
    instead of the number of jobs. Therefore, add a field to track a job's
    credit count, which represents the number of credits a job contributes
    to the scheduler's credit limit.

    Signed-off-by: Danilo Krummrich <dakr@redhat.com>
    Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com

I don't see any upstream users of ops->update_job_credits(), so someone should probably just send a revert of that patch

asahilina commented 5 days ago

@robclark nouveau is using variable credits, just not update_job_credits(). The credits logic still accesses the job in the race path, so removing that callback is not enough.

robclark commented 5 days ago

@robclark nouveau is using variable credits, just not update_job_credits(). The credits logic still accesses the job in the race path, so removing that callback is not enough.

hmm, if nouveau is using that, it makes it more complicated to revert. But that patch is fatally flawed, the whole point of a single-producer-single-consumer queue is that you have just a single producer and single consumer. That patch violates this rule.

asahilina commented 5 days ago

I suspect the correct fix is to remove the drm_sched_can_queue() condition entirely from drm_sched_wakeup() (so the work function is always woken up/queued and then simply no-ops if there is nothing to do, doing the check in the right context only), but I've already run into enough sharp edges in this code that I'm not going to be proposing that myself.

Edit: In fact this was already proposed here but for some reason Luben never implemented the proposed simplified drm_sched_wakeup() and only did a partial revert.

robclark commented 5 days ago

I suspect the correct fix is to remove the drm_sched_can_queue() condition entirely from drm_sched_wakeup() (so the work function is always woken up/queued and then simply no-ops if there is nothing to do, doing the check in the right context only), but I've already run into enough sharp edges in this code that I'm not going to be proposing that myself.

Edit: In fact this was already proposed here but for some reason Luben never implemented the proposed simplified drm_sched_wakeup() and only did a partial revert.

I've only looked briefly at the credit patches, but the call in drm_sched_wakeup() looks like it is only to try and avoid a wakeup. So yeah, removing that would be the thing to do.