AsahiLinux / linux

Linux kernel source tree
Other
2.35k stars 90 forks source link

gpu related crashes with kernel >= 6.9.7 #309

Closed oliverbestmann closed 1 month ago

oliverbestmann commented 4 months ago

Since updating from 6.9.5 to to 6.9.6 (and 6.9.9) i get random gpu/drm related crashes after a few minutes of usage.

Jul 15 10:20:18 m1pro kernel: ------------[ cut here ]------------
Jul 15 10:20:18 m1pro kernel: asahi 406400000.gpu: Jobs may not exceed the credit limit, truncate.
Jul 15 10:20:18 m1pro kernel: WARNING: CPU: 0 PID: 15794 at drivers/gpu/drm/scheduler/sched_main.c:140 drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 61401009 (nZCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: lr : drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel: sp : ffff800090397440
Jul 15 10:20:18 m1pro kernel: x29: ffff800090397440 x28: 0000000000000030 x27: ffff000014ad5000
Jul 15 10:20:18 m1pro kernel: x26: ffff80007a55d948 x25: 0000000000000000 x24: ffff000139b5dc00
Jul 15 10:20:18 m1pro kernel: x23: ffff800090397888 x22: ffff000139b5cb38 x21: ffff0005be57f5d8
Jul 15 10:20:18 m1pro kernel: x20: ffff00013bfb1c08 x19: ffff00013bfb1c08 x18: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 6572632065687420
Jul 15 10:20:18 m1pro kernel: x14: 6465656378652074 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  drm_sched_can_queue+0x110/0x168
Jul 15 10:20:18 m1pro kernel:  drm_sched_wakeup+0x18/0x7c
Jul 15 10:20:18 m1pro kernel:  drm_sched_entity_push_job+0x174/0x1e8
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0x12d8/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
Jul 15 10:20:18 m1pro kernel: Modules linked in: uinput xt_conntrack nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xt_addrtype nft_compat nf_tables qrtr rfcomm snd_seq_dummy snd_hrtimer snd_seq usbhid cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii snd_usb_audio snd_h>
Jul 15 10:20:18 m1pro kernel:  nvmem_spmi_mfd rtc_macsmc gpio_macsmc spi_hid_apple_of simple_mfd_spmi tps6598x spi_hid_apple regmap_spmi dwc3 pcie_apple udc_core pci_host_common nvme_apple i2c_pasemi_platform spi_apple i2c_pasemi_core apple_sart macsmc_rtkit nvmem_appl>
Jul 15 10:20:18 m1pro kernel: CPU: 0 PID: 15794 Comm: chromium Tainted: G S      W          6.9.9-asahi #1-NixOS
Jul 15 10:20:18 m1pro kernel: Hardware name: Apple MacBook Pro (14-inch, M1 Pro, 2021) (DT)
Jul 15 10:20:18 m1pro kernel: pstate: 21401009 (nzCv daif +PAN -UAO -TCO +DIT +SSBS BTYPE=--)
Jul 15 10:20:18 m1pro kernel: pc : __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel: lr : __kmalloc_node_track_caller+0x98/0x2bc
Jul 15 10:20:18 m1pro kernel: sp : ffff800090395d40
Jul 15 10:20:18 m1pro kernel: x29: ffff800090395d50 x28: 00000000ffffffa0 x27: ffff000639ee3280
Jul 15 10:20:18 m1pro kernel: x26: ffffffa00000c984 x25: 0000000000212a9c x24: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x23: 736120492079616d x22: 00000000ffffffff x21: 0000000000000cc0
Jul 15 10:20:18 m1pro kernel: x20: ffff000001f2cb00 x19: 0000000000000318 x18: 00000000000000ff
Jul 15 10:20:18 m1pro kernel: x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
Jul 15 10:20:18 m1pro kernel: x11: 00000000ffffffa0 x10: 0000000000000008 x9 : ffffffffffffffff
Jul 15 10:20:18 m1pro kernel: x8 : c98580007a45d9c4 x7 : 0000000000000cc0 x6 : 0000000000000318
Jul 15 10:20:18 m1pro kernel: x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000064ce340
Jul 15 10:20:18 m1pro kernel: x2 : 0000000000000200 x1 : 736120492079616d x0 : ffff000001f2cb00
Jul 15 10:20:18 m1pro kernel: Call trace:
Jul 15 10:20:18 m1pro kernel:  __kmalloc_node_track_caller+0xec/0x2bc
Jul 15 10:20:18 m1pro kernel:  krealloc+0x9c/0x144
Jul 15 10:20:18 m1pro kernel:  _RINvNtCsKOPqOvr6FN_5alloc7raw_vec11finish_growNtNtB4_5alloc6GlobalECsirMamryJlsQ_5asahi+0x44/0xac [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvMs0_NtCsKOPqOvr6FN_5alloc3vecINtB5_3VechE21try_extend_from_sliceCsirMamryJlsQ_5asahi+0xc8/0x13c [asahi]
Jul 15 10:20:18 m1pro kernel:  _RINvMs8_NtCsirMamryJlsQ_5asahi6objectINtB6_9GpuObjectNtNtNtB8_2fw6vertex17RunVertexG13V13_5INtNtB8_5alloc12GenericAllocBP_NtB1u_14HeapAllocationEE17new_init_preallocINtNtNtCsc1LFWrxnNA7_6kernel4init10___internal11InitClosureNCNCNvMs1_NtN>
Jul 15 10:20:18 m1pro kernel:  _RNvMs1_NtNtCsirMamryJlsQ_5asahi5queue6renderNtB7_18QueueInnerG13V13_513submit_render+0x1ba8/0x1dd0 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvXsK_NtCsirMamryJlsQ_5asahi5queueNtB5_13QueueG13V13_5NtB5_5Queue6submit+0xf74/0x1578 [asahi]
Jul 15 10:20:18 m1pro kernel:  _RNvNvXs_NtCsirMamryJlsQ_5asahi6driverNtB6_11AsahiDriverNtNtNtCsc1LFWrxnNA7_6kernel3drm3drv6Driver6IOCTLS12ASAHI_SUBMIT+0x648/0x840 [asahi]
Jul 15 10:20:18 m1pro kernel:  drm_ioctl_kernel+0xd4/0x13c
Jul 15 10:20:18 m1pro kernel:  drm_ioctl+0x23c/0x4e4
Jul 15 10:20:18 m1pro kernel:  __arm64_sys_ioctl+0xc0/0x118
Jul 15 10:20:18 m1pro kernel:  invoke_syscall.constprop.0+0x50/0x124
Jul 15 10:20:18 m1pro kernel:  do_el0_svc+0x40/0xf0
Jul 15 10:20:18 m1pro kernel:  el0_svc+0x34/0x11c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync_handler+0x140/0x14c
Jul 15 10:20:18 m1pro kernel:  el0t_64_sync+0x190/0x194
Jul 15 10:20:18 m1pro kernel: Code: 54000c20 b9402a82 aa1703e1 aa1403e0 (f8626af9) 
Jul 15 10:20:18 m1pro kernel: ---[ end trace 0000000000000000 ]---
Jul 15 10:20:18 m1pro kernel: Unable to handle kernel paging request at virtual address 006120492079636d
Jul 15 10:20:18 m1pro kernel: Mem abort info:
Jul 15 10:20:18 m1pro kernel:   ESR = 0x0000000096000004
Jul 15 10:20:18 m1pro kernel:   EC = 0x25: DABT (current EL), IL = 32 bits
Jul 15 10:20:18 m1pro kernel:   SET = 0, FnV = 0
Jul 15 10:20:18 m1pro kernel:   EA = 0, S1PTW = 0
Jul 15 10:20:18 m1pro kernel:   FSC = 0x04: level 0 translation fault
Jul 15 10:20:18 m1pro kernel: Data abort info:
Jul 15 10:20:18 m1pro kernel:   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jul 15 10:20:18 m1pro kernel:   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jul 15 10:20:18 m1pro kernel:   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jul 15 10:20:18 m1pro kernel: [006120492079636d] address between user and kernel address ranges
Jul 15 10:20:18 m1pro kernel: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP

Going back to 6.9.5 brings back a stable system.

jannau commented 1 month ago

Fix is in asahi-6.11-2 and later 6.11 stable kernels without issues resurfacing.