NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.44k stars 13.65k forks source link

amdgpu: black screen after resume from suspend #223690

Closed davidak closed 1 year ago

davidak commented 1 year ago

Describe the bug

i suspended the computer yesterday and resumed today by hitting ENTER. the computer is on, but i only see a black screen, not even a cursor

Steps To Reproduce

Steps to reproduce the behavior:

  1. suspend
  2. resume
  3. black screen

Expected behavior

have image on screen

Screenshots

imagine an all black screenshot

Additional context

Similar to previous issues:

amdgpu crashed the kernel

Mar 29 03:48:43 gaming kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1931602, emitted seq=1931604
Mar 29 03:48:43 gaming kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Mar 29 03:48:43 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Mar 29 03:48:43 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate
Mar 29 03:48:44 gaming kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Mar 29 03:48:47 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
Mar 29 03:48:47 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
Mar 29 03:48:47 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
Mar 29 03:48:47 gaming kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Mar 29 03:48:48 gaming kernel: [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x80000306)
Mar 29 03:48:48 gaming kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
Mar 29 03:48:48 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Mar 29 03:48:48 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Mar 29 03:48:49 gaming systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Mar 29 03:48:51 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
Mar 29 03:48:51 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
Mar 29 03:48:51 gaming kernel: amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:03:00.0
Mar 29 03:48:58 gaming .gsd-power-wrap[2018]: Error setting property 'PowerSaveMode' on interface org.gnome.Mutter.DisplayConfig: Timeout was reached (g-io-error-quark, 24)
Mar 29 03:49:01 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Mar 29 03:49:01 gaming kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000900000).
Mar 29 03:49:01 gaming kernel: [drm] VRAM is lost due to GPU reset!
Mar 29 03:49:01 gaming kernel: [drm] PSP is resuming...
Mar 29 03:49:02 gaming kernel: [drm] failed to load ucode SMC(0x31) 
Mar 29 03:49:02 gaming kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x80000306)
Mar 29 03:49:02 gaming kernel: [drm] reserve 0xa00000 from 0x81fd000000 for PSP TMR
Mar 29 03:49:03 gaming kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mar 29 03:49:03 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Mar 29 03:49:03 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
Mar 29 03:49:03 gaming kernel: amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b2900 (59.41.0)
Mar 29 03:49:03 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
Mar 29 03:49:06 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
Mar 29 03:49:09 gaming kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000036 SMN_C2PMSG_82:0x00000000
Mar 29 03:49:09 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Failed to SetDriverDramAddr!
Mar 29 03:49:09 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
Mar 29 03:49:09 gaming kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Mar 29 03:49:09 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Mar 29 03:49:09 gaming kernel: [drm] Skip scheduling IBs!
Mar 29 03:49:09 gaming kernel: [drm] Skip scheduling IBs!
Mar 29 03:49:09 gaming kernel: [drm] Skip scheduling IBs!
...
Mar 29 03:49:09 gaming kernel: [drm] Skip scheduling IBs!
Mar 29 03:49:09 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62
Mar 29 03:49:09 gaming kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
Mar 29 03:49:09 gaming kernel: snd_hda_intel 0000:03:00.1: Refused to change power state from D0 to D3hot
Mar 29 03:49:19 gaming kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1931604, emitted seq=1931606
Mar 29 03:49:19 gaming kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Mar 29 03:49:19 gaming kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Mar 29 03:49:19 gaming kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate
Mar 29 03:52:24 gaming kernel: INFO: task X:cs0:1456 blocked for more than 122 seconds.
Mar 29 03:52:24 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:52:24 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:52:24 gaming kernel: task:X:cs0           state:D stack:0     pid:1456  ppid:1431   flags:0x00004002
Mar 29 03:52:24 gaming kernel: Call Trace:
Mar 29 03:52:24 gaming kernel:  <TASK>
Mar 29 03:52:24 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:52:24 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:52:24 gaming kernel:  schedule_timeout+0x123/0x160
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_unlock_irqrestore+0x27/0x50
Mar 29 03:52:24 gaming kernel:  ? dma_fence_add_callback+0x6a/0xe0
Mar 29 03:52:24 gaming kernel:  dma_fence_wait_any_timeout+0x204/0x260
Mar 29 03:52:24 gaming kernel:  amdgpu_sa_bo_new+0x464/0x530 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:52:24 gaming kernel:  amdgpu_ib_get+0x43/0x90 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? drm_sched_fence_alloc+0x1e/0x40 [gpu_sched]
Mar 29 03:52:24 gaming kernel:  amdgpu_job_alloc_with_ib+0x72/0xb0 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_vm_sdma_update+0x30b/0x400 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_vm_ptes_update+0x2b0/0x800 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_vm_update_range+0x21e/0x750 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_vm_bo_update+0x2b0/0x570 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_vm_handle_moved+0x10c/0x120 [amdgpu]
Mar 29 03:52:24 gaming kernel:  amdgpu_cs_ioctl+0x130c/0x2090 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
Mar 29 03:52:24 gaming kernel:  drm_ioctl_kernel+0xb6/0x140 [drm]
Mar 29 03:52:24 gaming kernel:  drm_ioctl+0x22a/0x3e0 [drm]
Mar 29 03:52:24 gaming kernel:  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
Mar 29 03:52:24 gaming kernel:  amdgpu_drm_ioctl+0x4d/0x80 [amdgpu]
Mar 29 03:52:24 gaming kernel:  __x64_sys_ioctl+0x8b/0xc0
Mar 29 03:52:24 gaming kernel:  do_syscall_64+0x3c/0x90
Mar 29 03:52:24 gaming kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Mar 29 03:52:24 gaming kernel: RIP: 0033:0x7f9d2b304321
Mar 29 03:52:24 gaming kernel: RSP: 002b:00007f9d217fe7d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 29 03:52:24 gaming kernel: RAX: ffffffffffffffda RBX: 00007f9d217fe890 RCX: 00007f9d2b304321
Mar 29 03:52:24 gaming kernel: RDX: 00007f9d217fe890 RSI: 00000000c0186444 RDI: 0000000000000013
Mar 29 03:52:24 gaming kernel: RBP: 00000000c0186444 R08: 00007f9d217fe9e0 R09: 00007f9d217fe988
Mar 29 03:52:24 gaming kernel: R10: 0000000000918890 R11: 0000000000000246 R12: 00007f9d217fe9b0
Mar 29 03:52:24 gaming kernel: R13: 0000000000000013 R14: 0000000000000003 R15: 00007f9d217fe988
Mar 29 03:52:24 gaming kernel:  </TASK>
Mar 29 03:52:24 gaming kernel: INFO: task X:3921621 blocked for more than 122 seconds.
Mar 29 03:52:24 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:52:24 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:52:24 gaming kernel: task:X               state:D stack:0     pid:3921621 ppid:1431   flags:0x00000002
Mar 29 03:52:24 gaming kernel: Call Trace:
Mar 29 03:52:24 gaming kernel:  <TASK>
Mar 29 03:52:24 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:52:24 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:52:24 gaming kernel:  schedule_preempt_disabled+0x18/0x30
Mar 29 03:52:24 gaming kernel:  __mutex_lock.constprop.0+0x38d/0x700
Mar 29 03:52:24 gaming kernel:  amdgpu_dm_atomic_commit_tail+0x5f9/0x2c30 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? __sbitmap_get_word+0x24/0x70
Mar 29 03:52:24 gaming kernel:  ? sbitmap_get+0x97/0x1d0
Mar 29 03:52:24 gaming kernel:  ? blk_mq_do_dispatch_sched+0xa1/0x3b0
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_trylock+0x17/0x60
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_unlock+0x19/0x40
Mar 29 03:52:24 gaming kernel:  ? get_page_from_freelist+0x1436/0x1580
Mar 29 03:52:24 gaming kernel:  ? pre_validate_dsc+0x75/0x430 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? drm_atomic_helper_check_planes+0x156/0x230 [drm_kms_helper]
Mar 29 03:52:24 gaming kernel:  ? amdgpu_dm_atomic_check+0x6d3/0x1180 [amdgpu]
Mar 29 03:52:24 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_lock_irq+0x1d/0x50
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_unlock_irq+0x1f/0x40
Mar 29 03:52:24 gaming kernel:  ? __wait_for_common+0x1a7/0x1e0
Mar 29 03:52:24 gaming kernel:  ? __pfx_schedule_timeout+0x10/0x10
Mar 29 03:52:24 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:52:24 gaming kernel:  ? _raw_spin_lock_irqsave+0x28/0x60
Mar 29 03:52:24 gaming kernel:  ? complete_all+0x24/0x90
Mar 29 03:52:24 gaming kernel:  commit_tail+0x91/0x130 [drm_kms_helper]
Mar 29 03:52:24 gaming kernel:  drm_atomic_helper_commit+0x11f/0x150 [drm_kms_helper]
Mar 29 03:52:24 gaming kernel:  drm_atomic_commit+0x97/0xd0 [drm]
Mar 29 03:52:24 gaming kernel:  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Mar 29 03:52:24 gaming kernel:  drm_mode_gamma_set_ioctl+0x399/0x530 [drm]
Mar 29 03:52:24 gaming kernel:  ? __pfx_drm_mode_gamma_set_ioctl+0x10/0x10 [drm]
Mar 29 03:52:24 gaming kernel:  drm_ioctl_kernel+0xb6/0x140 [drm]
Mar 29 03:52:24 gaming kernel:  drm_ioctl+0x22a/0x3e0 [drm]
Mar 29 03:52:24 gaming kernel:  ? __pfx_drm_mode_gamma_set_ioctl+0x10/0x10 [drm]
Mar 29 03:52:24 gaming kernel:  ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
Mar 29 03:52:24 gaming kernel:  amdgpu_drm_ioctl+0x4d/0x80 [amdgpu]
Mar 29 03:52:24 gaming kernel:  __x64_sys_ioctl+0x8b/0xc0
Mar 29 03:52:24 gaming kernel:  do_syscall_64+0x3c/0x90
Mar 29 03:52:24 gaming kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Mar 29 03:52:24 gaming kernel: RIP: 0033:0x7fcd3a504321
Mar 29 03:52:24 gaming kernel: RSP: 002b:00007fffec233be0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 29 03:52:24 gaming kernel: RAX: ffffffffffffffda RBX: 00007fffec233c70 RCX: 00007fcd3a504321
Mar 29 03:52:24 gaming kernel: RDX: 00007fffec233c70 RSI: 00000000c02064a5 RDI: 0000000000000010
Mar 29 03:52:24 gaming kernel: RBP: 00000000c02064a5 R08: 00000000019d31f0 R09: 00000000019d33f0
Mar 29 03:52:24 gaming kernel: R10: 0000000000000042 R11: 0000000000000246 R12: 00000000019d2b30
Mar 29 03:52:24 gaming kernel: R13: 0000000000000010 R14: 00000000019ce570 R15: 0000000000000000
Mar 29 03:52:24 gaming kernel:  </TASK>
Mar 29 03:52:24 gaming kernel: INFO: task kworker/u32:26:3921699 blocked for more than 122 seconds.
Mar 29 03:52:24 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:52:24 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:52:24 gaming kernel: task:kworker/u32:26  state:D stack:0     pid:3921699 ppid:2      flags:0x00004000
Mar 29 03:52:24 gaming kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Mar 29 03:52:24 gaming kernel: Call Trace:
Mar 29 03:52:24 gaming kernel:  <TASK>
Mar 29 03:52:24 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:52:24 gaming kernel:  ? __wake_up_klogd.part.0+0x56/0x80
Mar 29 03:52:24 gaming kernel:  ? dev_vprintk_emit+0x175/0x19d
Mar 29 03:52:24 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:52:24 gaming kernel:  schedule_preempt_disabled+0x18/0x30
Mar 29 03:52:24 gaming kernel:  __mutex_lock.constprop.0+0x38d/0x700
Mar 29 03:52:24 gaming kernel:  dm_suspend+0xcc/0x1e0 [amdgpu]
Mar 29 03:52:25 gaming kernel:  amdgpu_device_ip_suspend_phase1+0x72/0xe0 [amdgpu]
Mar 29 03:52:25 gaming kernel:  amdgpu_device_ip_suspend+0x20/0x70 [amdgpu]
Mar 29 03:52:25 gaming kernel:  amdgpu_device_pre_asic_reset+0xd5/0x290 [amdgpu]
Mar 29 03:52:25 gaming kernel:  amdgpu_device_gpu_recover.cold+0x5f8/0xb4a [amdgpu]
Mar 29 03:52:25 gaming kernel:  amdgpu_job_timedout+0x18a/0x1c0 [amdgpu]
Mar 29 03:52:25 gaming kernel:  ? _raw_spin_unlock+0x19/0x40
Mar 29 03:52:25 gaming kernel:  drm_sched_job_timedout+0x77/0x110 [gpu_sched]
Mar 29 03:52:25 gaming kernel:  process_one_work+0x1e2/0x3b0
Mar 29 03:52:25 gaming kernel:  worker_thread+0x54/0x3a0
Mar 29 03:52:25 gaming kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 29 03:52:25 gaming kernel:  kthread+0xe9/0x110
Mar 29 03:52:25 gaming kernel:  ? __pfx_kthread+0x10/0x10
Mar 29 03:52:25 gaming kernel:  ret_from_fork+0x29/0x50
Mar 29 03:52:25 gaming kernel:  </TASK>
Mar 29 03:54:27 gaming kernel: INFO: task X:cs0:1456 blocked for more than 245 seconds.
Mar 29 03:54:27 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:54:27 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:54:27 gaming kernel: task:X:cs0           state:D stack:0     pid:1456  ppid:1431   flags:0x00004002
Mar 29 03:54:27 gaming kernel: Call Trace:
Mar 29 03:54:27 gaming kernel:  <TASK>
Mar 29 03:54:27 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:54:27 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:54:27 gaming kernel:  schedule_timeout+0x123/0x160
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_unlock_irqrestore+0x27/0x50
Mar 29 03:54:27 gaming kernel:  ? dma_fence_add_callback+0x6a/0xe0
Mar 29 03:54:27 gaming kernel:  dma_fence_wait_any_timeout+0x204/0x260
Mar 29 03:54:27 gaming kernel:  amdgpu_sa_bo_new+0x464/0x530 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:54:27 gaming kernel:  amdgpu_ib_get+0x43/0x90 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? drm_sched_fence_alloc+0x1e/0x40 [gpu_sched]
Mar 29 03:54:27 gaming kernel:  amdgpu_job_alloc_with_ib+0x72/0xb0 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_vm_sdma_update+0x30b/0x400 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_vm_ptes_update+0x2b0/0x800 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_vm_update_range+0x21e/0x750 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_vm_bo_update+0x2b0/0x570 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_vm_handle_moved+0x10c/0x120 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_cs_ioctl+0x130c/0x2090 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
Mar 29 03:54:27 gaming kernel:  drm_ioctl_kernel+0xb6/0x140 [drm]
Mar 29 03:54:27 gaming kernel:  drm_ioctl+0x22a/0x3e0 [drm]
Mar 29 03:54:27 gaming kernel:  ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
Mar 29 03:54:27 gaming kernel:  amdgpu_drm_ioctl+0x4d/0x80 [amdgpu]
Mar 29 03:54:27 gaming kernel:  __x64_sys_ioctl+0x8b/0xc0
Mar 29 03:54:27 gaming kernel:  do_syscall_64+0x3c/0x90
Mar 29 03:54:27 gaming kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Mar 29 03:54:27 gaming kernel: RIP: 0033:0x7f9d2b304321
Mar 29 03:54:27 gaming kernel: RSP: 002b:00007f9d217fe7d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 29 03:54:27 gaming kernel: RAX: ffffffffffffffda RBX: 00007f9d217fe890 RCX: 00007f9d2b304321
Mar 29 03:54:27 gaming kernel: RDX: 00007f9d217fe890 RSI: 00000000c0186444 RDI: 0000000000000013
Mar 29 03:54:27 gaming kernel: RBP: 00000000c0186444 R08: 00007f9d217fe9e0 R09: 00007f9d217fe988
Mar 29 03:54:27 gaming kernel: R10: 0000000000918890 R11: 0000000000000246 R12: 00007f9d217fe9b0
Mar 29 03:54:27 gaming kernel: R13: 0000000000000013 R14: 0000000000000003 R15: 00007f9d217fe988
Mar 29 03:54:27 gaming kernel:  </TASK>
Mar 29 03:54:27 gaming kernel: INFO: task X:3921621 blocked for more than 245 seconds.
Mar 29 03:54:27 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:54:27 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:54:27 gaming kernel: task:X               state:D stack:0     pid:3921621 ppid:1431   flags:0x00000002
Mar 29 03:54:27 gaming kernel: Call Trace:
Mar 29 03:54:27 gaming kernel:  <TASK>
Mar 29 03:54:27 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:54:27 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:54:27 gaming kernel:  schedule_preempt_disabled+0x18/0x30
Mar 29 03:54:27 gaming kernel:  __mutex_lock.constprop.0+0x38d/0x700
Mar 29 03:54:27 gaming kernel:  amdgpu_dm_atomic_commit_tail+0x5f9/0x2c30 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? __sbitmap_get_word+0x24/0x70
Mar 29 03:54:27 gaming kernel:  ? sbitmap_get+0x97/0x1d0
Mar 29 03:54:27 gaming kernel:  ? blk_mq_do_dispatch_sched+0xa1/0x3b0
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_trylock+0x17/0x60
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_unlock+0x19/0x40
Mar 29 03:54:27 gaming kernel:  ? get_page_from_freelist+0x1436/0x1580
Mar 29 03:54:27 gaming kernel:  ? pre_validate_dsc+0x75/0x430 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? drm_atomic_helper_check_planes+0x156/0x230 [drm_kms_helper]
Mar 29 03:54:27 gaming kernel:  ? amdgpu_dm_atomic_check+0x6d3/0x1180 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_lock_irq+0x1d/0x50
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_unlock_irq+0x1f/0x40
Mar 29 03:54:27 gaming kernel:  ? __wait_for_common+0x1a7/0x1e0
Mar 29 03:54:27 gaming kernel:  ? __pfx_schedule_timeout+0x10/0x10
Mar 29 03:54:27 gaming kernel:  ? preempt_count_add+0x74/0xa0
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_lock_irqsave+0x28/0x60
Mar 29 03:54:27 gaming kernel:  ? complete_all+0x24/0x90
Mar 29 03:54:27 gaming kernel:  commit_tail+0x91/0x130 [drm_kms_helper]
Mar 29 03:54:27 gaming kernel:  drm_atomic_helper_commit+0x11f/0x150 [drm_kms_helper]
Mar 29 03:54:27 gaming kernel:  drm_atomic_commit+0x97/0xd0 [drm]
Mar 29 03:54:27 gaming kernel:  ? __pfx___drm_printfn_info+0x10/0x10 [drm]
Mar 29 03:54:27 gaming kernel:  drm_mode_gamma_set_ioctl+0x399/0x530 [drm]
Mar 29 03:54:27 gaming kernel:  ? __pfx_drm_mode_gamma_set_ioctl+0x10/0x10 [drm]
Mar 29 03:54:27 gaming kernel:  drm_ioctl_kernel+0xb6/0x140 [drm]
Mar 29 03:54:27 gaming kernel:  drm_ioctl+0x22a/0x3e0 [drm]
Mar 29 03:54:27 gaming kernel:  ? __pfx_drm_mode_gamma_set_ioctl+0x10/0x10 [drm]
Mar 29 03:54:27 gaming kernel:  ? ioctl_has_perm.constprop.0.isra.0+0xbd/0x120
Mar 29 03:54:27 gaming kernel:  amdgpu_drm_ioctl+0x4d/0x80 [amdgpu]
Mar 29 03:54:27 gaming kernel:  __x64_sys_ioctl+0x8b/0xc0
Mar 29 03:54:27 gaming kernel:  do_syscall_64+0x3c/0x90
Mar 29 03:54:27 gaming kernel:  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Mar 29 03:54:27 gaming kernel: RIP: 0033:0x7fcd3a504321
Mar 29 03:54:27 gaming kernel: RSP: 002b:00007fffec233be0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 29 03:54:27 gaming kernel: RAX: ffffffffffffffda RBX: 00007fffec233c70 RCX: 00007fcd3a504321
Mar 29 03:54:27 gaming kernel: RDX: 00007fffec233c70 RSI: 00000000c02064a5 RDI: 0000000000000010
Mar 29 03:54:27 gaming kernel: RBP: 00000000c02064a5 R08: 00000000019d31f0 R09: 00000000019d33f0
Mar 29 03:54:27 gaming kernel: R10: 0000000000000042 R11: 0000000000000246 R12: 00000000019d2b30
Mar 29 03:54:27 gaming kernel: R13: 0000000000000010 R14: 00000000019ce570 R15: 0000000000000000
Mar 29 03:54:27 gaming kernel:  </TASK>
Mar 29 03:54:27 gaming kernel: INFO: task kworker/u32:26:3921699 blocked for more than 245 seconds.
Mar 29 03:54:27 gaming kernel:       Not tainted 6.2.0 #1-NixOS
Mar 29 03:54:27 gaming kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 29 03:54:27 gaming kernel: task:kworker/u32:26  state:D stack:0     pid:3921699 ppid:2      flags:0x00004000
Mar 29 03:54:27 gaming kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Mar 29 03:54:27 gaming kernel: Call Trace:
Mar 29 03:54:27 gaming kernel:  <TASK>
Mar 29 03:54:27 gaming kernel:  __schedule+0x30d/0x1270
Mar 29 03:54:27 gaming kernel:  ? __wake_up_klogd.part.0+0x56/0x80
Mar 29 03:54:27 gaming kernel:  ? dev_vprintk_emit+0x175/0x19d
Mar 29 03:54:27 gaming kernel:  schedule+0x61/0xe0
Mar 29 03:54:27 gaming kernel:  schedule_preempt_disabled+0x18/0x30
Mar 29 03:54:27 gaming kernel:  __mutex_lock.constprop.0+0x38d/0x700
Mar 29 03:54:27 gaming kernel:  dm_suspend+0xcc/0x1e0 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_device_ip_suspend_phase1+0x72/0xe0 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_device_ip_suspend+0x20/0x70 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_device_pre_asic_reset+0xd5/0x290 [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_device_gpu_recover.cold+0x5f8/0xb4a [amdgpu]
Mar 29 03:54:27 gaming kernel:  amdgpu_job_timedout+0x18a/0x1c0 [amdgpu]
Mar 29 03:54:27 gaming kernel:  ? _raw_spin_unlock+0x19/0x40
Mar 29 03:54:27 gaming kernel:  drm_sched_job_timedout+0x77/0x110 [gpu_sched]
Mar 29 03:54:27 gaming kernel:  process_one_work+0x1e2/0x3b0
Mar 29 03:54:27 gaming kernel:  worker_thread+0x54/0x3a0
Mar 29 03:54:27 gaming kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 29 03:54:27 gaming kernel:  kthread+0xe9/0x110
Mar 29 03:54:27 gaming kernel:  ? __pfx_kthread+0x10/0x10
Mar 29 03:54:27 gaming kernel:  ret_from_fork+0x29/0x50
Mar 29 03:54:27 gaming kernel:  </TASK>

full system log: amdgpu_crash.txt.zip

probably something upstream (amdgpu) has to fix

Notify maintainers

Metadata

Atemu commented 1 year ago

Why does it say 6.2.0 in your log? Are you actually running 6.2.6? Please confirm with uname -a.

davidak commented 1 year ago

I am running 6.2.6 now, but before the reboot it was 6.2.0 (NixOS 22.11.2999.a7cc81913bb).

Could be already fixed in that version. I watch it.

lorenz commented 1 year ago

Had a look at the log, that's a known issue. You have too little free RAM to suspend as suspending means evacuating all VRAM into system memory after the kernel allocator is already in NOIO mode thus cannot swap further. I worked around this with https://git.dolansoft.org/lorenz/memreserver which gets executed before sleep and forces the kernel to keep emough free memory around for supend to succeed. Adjust the amount of memory to your card VRAM +1GiB.

I should make this a NixOS module at some point.

lorenz commented 1 year ago

The crash is due to the fact that amdgpu aborts the suspend, which leads to the kernel attempting s2idle which is not properly supported on this platform with an AMD GPU leading to the SMU failing.

davidak commented 1 year ago

that's a known issue. You have too little free RAM to suspend as suspending means evacuating all VRAM into system memory

i have 32 GB RAM, 32 GB SWAP and 8 GB VRAM. i see how that can be an issue in this context when RAM and VRAM are filled

i had reported a similar issue before and there where multiple fixes

https://gitlab.freedesktop.org/drm/amd/-/issues/2223 https://github.com/torvalds/linux/commit/8d4de331f1b24a22d18e3c6116aa25228cf54854 (in 6.1) https://github.com/systemd/systemd/issues/25151 is still open

lorenz commented 1 year ago

AMD did fix that just attempting to go to S3 doesn't result in a GPU reset after TTM fails to evacuate the GPU VRAM, instead it aborts the S3 suspend attempt. But it still isn't able to suspend under memory pressure with a GPU with external VRAM because of some annoying design limitations on the PM subsystem (namely that there are no subsystem constraints and no phases, this has also bitten me on the storage/SCSI side). These limitations mean that it cannot perform writeback or swapping at the point where the VRAM eviction happens. Thus even with a lot of swap or nominally free memory being used as cache you end up with this issue. I'm running 64GiB RAM/96GiB swap and still had the same problem.

systemd just adds fuel to the fire by then attempting to go into s2idle which on AMD does a bunch of work which can result in issues on systems which aren't expected to go into s2idle, but on paper this behavior is acceptable. Crashing/hanging the GPU SMU by doing things not supported on the platform to it is technically on AMD.

I've been running my workaround for more than two years and never had any issues again. It essentially just installs a sleep hook which runs before the kernel actually suspends devices which allocates a bit more memory than the GPU has VRAM, forces the kernel to actually back it with real RAM (by locking it and writing zeroes to it) and then terminates. If the kernel cannot find real free RAM to back this allocation, it has to clear caches, do writebacks or swap out memory pages here while the system is still under normal operation. Then it terminates, leaving behind a large amount of truly free RAM which the kernel can then immediately use to evacuate VRAM into.

I'm thinking of spending some time to make this nice, i.e. a proper module which detects all GPUs with onboard VRAM (it's not needed for most APUs/notebook GPUs which share RAM), sums the amount, adds a fudge factor and then performs this without the need to manually configure anything.

peterhoeg commented 1 year ago

I have an ancient desktop with an AMD dGPU (8GB VRAM) and a laptop with a ryzen 2 CPU (512M VRAM). I have experienced the black screen problem a lot on the laptop but only once or twice on the desktop.

I tried your memreserver with this package def - very hopeful that it does the trick because it’s annoying AF with the laptop.

{ lib , stdenv , fetchFromGitLab , gigaBytes ? 9 }:

stdenv.mkDerivation rec { pname = "memreserver"; version = "0.0.0.20200414";

src = fetchFromGitLab { domain = "git.dolansoft.org"; owner = "lorenz"; repo = pname; rev = "094963f0a90a6b059240ecc6fff9aeb8213e64cc"; hash = "sha256-wLHnOR+lgWFy0IdbQBKKA6HcMLejZHpfScNT9KDfSlw="; };

postPatch = '' substituteInPlace Makefile \ --replace /usr/local $out \ --replace /etc $out/lib

substituteInPlace main.c \
  --replace 'amount = 5' 'amount = ${toString gigaBytes}' \
  --replace ' 5G' ' ${toString gigaBytes}G'

substituteInPlace memreserver.service \
  --replace /usr/local $out \
  --replace ' 5G' ' ${toString gigaBytes}G'

'';

preInstall = '' mkdir -p $out/{bin,lib/systemd/system} '';

meta = with lib; { description = "Reserve memory for AMDGPU VRAM"; }; }

lorenz commented 1 year ago

@peterhoeg AFAIK all Ryzen 2000-series mobile CPUs have an integrated GPU. Unless you also have a separate dedicated GPU on your notebook your issues have a different cause and will not be fixed by memreserver. Integrated GPUs do not have dedicated VRAM but rely on a slice of RAM shared with the CPU which is kept in self-refresh during S3 so it does not need to be evacuated, thus the problem cannot occur there.

lorenz commented 1 year ago

So I've done some work on https://git.dolansoft.org/lorenz/memreserver, it now uses libdrm to dynamically determine the amount of RAM to be reserved as well as skipping the process if no GPU is found which requires this. It still only works for AMD GPUs as these are the only ones I've personally experienced the problem on, but I own no Intel dGPUs which are probably also affected.

Please test this improved version and report any issues. If it works out well, I need to rename it to something better (open to suggestions, I haven't found anything good yet) and make a NixOS module for it. Maybe we could even enable it automatically if amdgpu is in initrd.kernelModules or initrd.availableKernelModules, otherwise people need to know to turn the right knob to not get suspend failures.

EDIT: Here's a draft module, still without default enabling: https://github.com/lorenz/nixpkgs/commit/ff28634eb779b3b73b96812e8e477e7ed1d4a6ad

utkarshgupta137 commented 1 year ago

I have a similar issue, which might be related: My laptop with Ryzen 6800H & RX 6850M XT fails to suspend. After I run systemctl suspend or sudo systemctl suspend, my laptop screen goes blank after 0.5 seconds, then it wakes up automatically after 5 seconds & then it goes blank again after 5 seconds. But it still doesn't enter suspend (the power button & other LEDs remain solid instead of blinking). Curiously, sudo pm-suspend works without any issues. Even more curiously, PM_DEBUG=true sudo pm-suspend has the same behavior as systemctl suspend.

Here is the dmesg output for `systemctl suspend` ``` [ 3531.806232] PM: suspend entry (deep) [ 3531.822353] Filesystems sync: 0.016 seconds [ 3532.087298] Freezing user space processes [ 3532.088759] Freezing user space processes completed (elapsed 0.001 seconds) [ 3532.088762] OOM killer disabled. [ 3532.088763] Freezing remaining freezable tasks [ 3532.089739] Freezing remaining freezable tasks completed (elapsed 0.000 seconds) [ 3532.089755] printk: Suspending console(s) (use no_console_suspend to debug) [ 3532.108526] queueing ieee80211 work while going to suspend [ 3532.109646] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate [ 3532.109668] ------------[ cut here ]------------ [ 3532.109669] WARNING: CPU: 14 PID: 33555 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2489 dm_suspend+0x1a6/0x1c0 [amdgpu] [ 3532.110186] Modules linked in: ccm af_packet uinput cmac algif_hash algif_skcipher af_alg bnep amdgpu msr mt7921e mt7921_common mt76_connac_lib snd_hda_codec_realtek mt76 snd_hda_codec_generic mousedev mac80211 ledtrig_audio snd_hda_codec_hdmi iommu_v2 gpu_sched snd_hda_intel drm_buddy drm_ttm_helper snd_intel_dspcfg ttm btusb snd_intel_sdw_acpi edac_mce_amd btrtl snd_hda_codec drm_display_helper edac_core intel_rapl_msr btbcm intel_rapl_common btintel crc32_pclmul btmtk polyval_clmulni hid_multitouch drm_kms_helper r8169 snd_hda_core polyval_generic gf128mul nls_iso8859_1 ghash_clmulni_intel wmi_bmof bluetooth cfg80211 agpgart sha512_ssse3 ideapad_laptop realtek snd_hwdep sha512_generic sparse_keymap nls_cp437 snd_pcm aesni_intel i2c_algo_bit ucsi_acpi wdat_wdt syscopyarea platform_profile typec_ucsi sp5100_tco vfat ecdh_generic crypto_simd mdio_devres fat snd_timer rfkill cryptd snd ecc typec rapl sysfillrect watchdog tpm_crb video k10temp i2c_piix4 libphy libaes cdc_acm sysimgblt [ 3532.110260] soundcore libarc4 thermal roles battery wmi i2c_hid_acpi ip6_tables tpm_tis i2c_hid tpm_tis_core joydev xt_conntrack tiny_power_button nf_conntrack i2c_designware_platform acpi_cpufreq evdev i2c_designware_core acpi_tad nf_defrag_ipv6 input_leds button ac led_class nf_defrag_ipv4 mac_hid serio_raw ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables sch_fq_codel libcrc32c nfnetlink ctr loop cpufreq_ondemand tun tap macvlan bridge stp llc kvm_amd ccp kvm drm irqbypass fuse backlight deflate i2c_core efi_pstore configfs efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_generic ext4 usbhid hid crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas xhci_hcd nvme thunderbolt nvme_core atkbd usbcore libps2 vivaldi_fmap t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif i8042 usb_common crct10dif_generic crct10dif_pclmul crct10dif_common rtc_cmos serio dm_mod dax [ 3532.110334] CPU: 14 PID: 33555 Comm: kworker/u32:16 Tainted: G W 6.2.10 #1-NixOS [ 3532.110338] Hardware name: LENOVO 82UH/LNVNB161216, BIOS K9CN34WW 07/22/2022 [ 3532.110341] Workqueue: events_unbound async_run_entry_fn [ 3532.110349] RIP: 0010:dm_suspend+0x1a6/0x1c0 [amdgpu] [ 3532.110561] Code: 4c 89 e7 e8 fc eb 1f 00 48 89 ef e8 34 db 00 00 4c 89 f7 e8 0c bb ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 ac 15 20 00 eb d6 <0f> 0b e9 9b fe ff ff e8 9e 25 23 e6 66 66 2e 0f 1f 84 00 00 00 00 [ 3532.110563] RSP: 0018:ffffa8c94a847d10 EFLAGS: 00010286 [ 3532.110565] RAX: 0000000000000000 RBX: ffff8b198d218980 RCX: 0000000000000000 [ 3532.110566] RDX: 0000000000000000 RSI: ffffffffa816606a RDI: ffff8b198d200000 [ 3532.110567] RBP: ffff8b198d200000 R08: ffffffffa8857120 R09: 00000000a91f06f6 [ 3532.110567] R10: ffffffffffffffff R11: ffffffffa890d0d8 R12: ffff8b198d200000 [ 3532.110568] R13: 0000000000000001 R14: ffff8b198d2169f8 R15: ffff8b198e5b7808 [ 3532.110569] FS: 0000000000000000(0000) GS:ffff8b1cae980000(0000) knlGS:0000000000000000 [ 3532.110570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3532.110571] CR2: 00007f666628d81c CR3: 000000019a410000 CR4: 0000000000750ee0 [ 3532.110572] PKRU: 55555554 [ 3532.110572] Call Trace: [ 3532.110575] [ 3532.110580] amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu] [ 3532.110710] amdgpu_device_suspend+0xc7/0x180 [amdgpu] [ 3532.110838] pci_pm_suspend+0x77/0x160 [ 3532.110842] ? __pfx_pci_pm_suspend+0x10/0x10 [ 3532.110844] dpm_run_callback+0x8c/0x1e0 [ 3532.110847] __device_suspend+0xf1/0x4f0 [ 3532.110849] async_suspend+0x1e/0x70 [ 3532.110851] async_run_entry_fn+0x34/0x130 [ 3532.110853] process_one_work+0x1c8/0x3c0 [ 3532.110857] worker_thread+0x51/0x390 [ 3532.110859] ? __pfx_worker_thread+0x10/0x10 [ 3532.110861] kthread+0xed/0x120 [ 3532.110863] ? __pfx_kthread+0x10/0x10 [ 3532.110865] ret_from_fork+0x2c/0x50 [ 3532.110869] [ 3532.110870] ---[ end trace 0000000000000000 ]--- [ 3532.405191] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) [ 3532.405338] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed [ 3538.036639] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000000 SMN_C2PMSG_82:0x00000000 [ 3538.036642] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 3538.036645] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 3538.036646] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -62 [ 3540.036736] [drm] psp gfx command UNKNOWN CMD(0xFFFFFFFF) failed and response status is (0xFFFFFFFF) [ 3540.036740] [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate tmr [ 3540.036903] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -22 [ 3540.039748] ACPI: EC: interrupt blocked [ 3540.068622] amdgpu 0000:03:00.0: amdgpu: MODE1 reset [ 3540.068631] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset [ 3540.165412] nvme 0000:06:00.0: VC buffer not found in pci_save_vc_state [ 3540.166554] amdgpu 0000:03:00.0: amdgpu: GPU psp mode1 reset [ 3540.421912] [drm] psp is not working correctly before mode1 reset! [ 3540.421913] amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed [ 3540.421963] amdgpu 0000:03:00.0: PM: pci_pm_suspend_noirq(): amdgpu_pmops_suspend_noirq+0x0/0x40 [amdgpu] returns -22 [ 3540.422095] amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_suspend_noirq+0x0/0x2a0 returns -22 [ 3540.422100] amdgpu 0000:03:00.0: PM: failed to suspend async: error -22 [ 3540.422630] ACPI: EC: interrupt unblocked [ 3540.647408] PM: noirq suspend of devices failed [ 3540.649004] [drm] PCIE GART of 512M enabled (table at 0x00000082FEB00000). [ 3540.649029] [drm] PSP is resuming... [ 3540.649207] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000). [ 3540.649238] [drm] PSP is resuming... [ 3540.666039] nvme nvme0: Shutdown timeout set to 10 seconds [ 3540.669318] nvme nvme0: 16/0/0 default/read/poll queues [ 3540.671128] [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR [ 3541.441333] amdgpu 0000:36:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 3541.451720] amdgpu 0000:36:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 3541.451722] amdgpu 0000:36:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 3541.451724] amdgpu 0000:36:00.0: amdgpu: SMU is resuming... [ 3541.452093] amdgpu 0000:36:00.0: amdgpu: SMU is resumed successfully! [ 3541.453532] [drm] DMUB hardware initialized: version=0x0400002E [ 3541.659457] [drm] kiq ring mec 2 pipe 1 q 0 [ 3541.663593] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 3541.664169] [drm] JPEG decode initialized successfully. [ 3541.664181] amdgpu 0000:36:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 3541.664183] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 3541.664185] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 3541.664185] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 3541.664186] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 3541.664187] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 3541.664188] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 3541.664188] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 3541.664189] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 3541.664190] amdgpu 0000:36:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ 3541.664191] amdgpu 0000:36:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 3541.664192] amdgpu 0000:36:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 [ 3541.664193] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 [ 3541.664193] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 [ 3541.664194] amdgpu 0000:36:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 [ 3549.278279] [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed. [ 3549.278474] [drm:psp_resume [amdgpu]] *ERROR* Failed to process memory training! [ 3549.278630] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block failed -62 [ 3549.278765] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). [ 3549.278766] amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62 [ 3549.278772] amdgpu 0000:03:00.0: PM: failed to resume async: error -62 [ 3549.281863] OOM killer enabled. [ 3549.281865] Restarting tasks ... done. [ 3549.287585] random: crng reseeded on system resumption [ 3549.600078] PM: suspend exit [ 3549.600179] PM: suspend entry (s2idle) [ 3549.617504] Filesystems sync: 0.017 seconds [ 3549.619985] Freezing user space processes [ 3549.621478] Freezing user space processes completed (elapsed 0.001 seconds) [ 3549.621480] OOM killer disabled. [ 3549.621481] Freezing remaining freezable tasks [ 3549.622600] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 3549.622606] printk: Suspending console(s) (use no_console_suspend to debug) [ 3549.641515] queueing ieee80211 work while going to suspend [ 3549.642135] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate [ 3549.642146] ------------[ cut here ]------------ [ 3549.642147] WARNING: CPU: 6 PID: 33609 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2489 dm_suspend+0x1a6/0x1c0 [amdgpu] [ 3549.642384] Modules linked in: ccm af_packet uinput cmac algif_hash algif_skcipher af_alg bnep amdgpu msr mt7921e mt7921_common mt76_connac_lib snd_hda_codec_realtek mt76 snd_hda_codec_generic mousedev mac80211 ledtrig_audio snd_hda_codec_hdmi iommu_v2 gpu_sched snd_hda_intel drm_buddy drm_ttm_helper snd_intel_dspcfg ttm btusb snd_intel_sdw_acpi edac_mce_amd btrtl snd_hda_codec drm_display_helper edac_core intel_rapl_msr btbcm intel_rapl_common btintel crc32_pclmul btmtk polyval_clmulni hid_multitouch drm_kms_helper r8169 snd_hda_core polyval_generic gf128mul nls_iso8859_1 ghash_clmulni_intel wmi_bmof bluetooth cfg80211 agpgart sha512_ssse3 ideapad_laptop realtek snd_hwdep sha512_generic sparse_keymap nls_cp437 snd_pcm aesni_intel i2c_algo_bit ucsi_acpi wdat_wdt syscopyarea platform_profile typec_ucsi sp5100_tco vfat ecdh_generic crypto_simd mdio_devres fat snd_timer rfkill cryptd snd ecc typec rapl sysfillrect watchdog tpm_crb video k10temp i2c_piix4 libphy libaes cdc_acm sysimgblt [ 3549.642426] soundcore libarc4 thermal roles battery wmi i2c_hid_acpi ip6_tables tpm_tis i2c_hid tpm_tis_core joydev xt_conntrack tiny_power_button nf_conntrack i2c_designware_platform acpi_cpufreq evdev i2c_designware_core acpi_tad nf_defrag_ipv6 input_leds button ac led_class nf_defrag_ipv4 mac_hid serio_raw ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables sch_fq_codel libcrc32c nfnetlink ctr loop cpufreq_ondemand tun tap macvlan bridge stp llc kvm_amd ccp kvm drm irqbypass fuse backlight deflate i2c_core efi_pstore configfs efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_generic ext4 usbhid hid crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas xhci_hcd nvme thunderbolt nvme_core atkbd usbcore libps2 vivaldi_fmap t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif i8042 usb_common crct10dif_generic crct10dif_pclmul crct10dif_common rtc_cmos serio dm_mod dax [ 3549.642476] CPU: 6 PID: 33609 Comm: kworker/u32:70 Tainted: G W 6.2.10 #1-NixOS [ 3549.642478] Hardware name: LENOVO 82UH/LNVNB161216, BIOS K9CN34WW 07/22/2022 [ 3549.642480] Workqueue: events_unbound async_run_entry_fn [ 3549.642484] RIP: 0010:dm_suspend+0x1a6/0x1c0 [amdgpu] [ 3549.642677] Code: 4c 89 e7 e8 fc eb 1f 00 48 89 ef e8 34 db 00 00 4c 89 f7 e8 0c bb ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 ac 15 20 00 eb d6 <0f> 0b e9 9b fe ff ff e8 9e 25 23 e6 66 66 2e 0f 1f 84 00 00 00 00 [ 3549.642679] RSP: 0018:ffffa8c9433f7d10 EFLAGS: 00010282 [ 3549.642680] RAX: 0000000000000000 RBX: ffff8b198d218980 RCX: 0000000000000000 [ 3549.642681] RDX: 0000000000000000 RSI: ffffffffa816606a RDI: ffff8b198d200000 [ 3549.642682] RBP: ffff8b198d200000 R08: ffffffffa8857120 R09: 00000000a91f2b96 [ 3549.642683] R10: ffffffffffffffff R11: ffffffffa890dbe8 R12: ffff8b198d200000 [ 3549.642684] R13: 0000000000000001 R14: ffff8b198d2169f8 R15: ffff8b198e5b70e8 [ 3549.642685] FS: 0000000000000000(0000) GS:ffff8b1cae780000(0000) knlGS:0000000000000000 [ 3549.642686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3549.642686] CR2: 00007f666627edd4 CR3: 00000001e086c000 CR4: 0000000000750ee0 [ 3549.642687] PKRU: 55555554 [ 3549.642688] Call Trace: [ 3549.642691] [ 3549.642695] amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu] [ 3549.642828] amdgpu_device_suspend+0xc7/0x180 [amdgpu] [ 3549.642951] pci_pm_suspend+0x77/0x160 [ 3549.642954] ? __pfx_pci_pm_suspend+0x10/0x10 [ 3549.642956] dpm_run_callback+0x8c/0x1e0 [ 3549.642959] __device_suspend+0xf1/0x4f0 [ 3549.642962] async_suspend+0x1e/0x70 [ 3549.642964] async_run_entry_fn+0x34/0x130 [ 3549.642966] process_one_work+0x1c8/0x3c0 [ 3549.642970] worker_thread+0x51/0x390 [ 3549.642972] ? __pfx_worker_thread+0x10/0x10 [ 3549.642974] kthread+0xed/0x120 [ 3549.642976] ? __pfx_kthread+0x10/0x10 [ 3549.642977] ret_from_fork+0x2c/0x50 [ 3549.642981] [ 3549.642982] ---[ end trace 0000000000000000 ]--- [ 3549.939679] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) [ 3549.939825] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed [ 3555.600745] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000000 SMN_C2PMSG_82:0x00000000 [ 3555.600747] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 3555.600750] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 3555.600751] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -62 [ 3557.600740] [drm] psp gfx command UNKNOWN CMD(0xFFFFFFFF) failed and response status is (0xFFFFFFFF) [ 3557.600743] [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate tmr [ 3557.601080] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -22 [ 3557.604179] ACPI: EC: interrupt blocked [ 3575.436015] ACPI: EC: interrupt unblocked [ 3576.337058] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000). [ 3576.337082] [drm] PSP is resuming... [ 3576.343656] [drm] PCIE GART of 512M enabled (table at 0x00000082FEB00000). [ 3576.343677] [drm] PSP is resuming... [ 3576.359216] [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR [ 3577.132397] amdgpu 0000:36:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 3577.142704] amdgpu 0000:36:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 3577.142706] amdgpu 0000:36:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 3577.142708] amdgpu 0000:36:00.0: amdgpu: SMU is resuming... [ 3577.143699] amdgpu 0000:36:00.0: amdgpu: SMU is resumed successfully! [ 3577.145070] [drm] DMUB hardware initialized: version=0x0400002E [ 3577.352164] [drm] kiq ring mec 2 pipe 1 q 0 [ 3577.356467] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 3577.357056] [drm] JPEG decode initialized successfully. [ 3577.357064] amdgpu 0000:36:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 3577.357066] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 3577.357067] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 3577.357068] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 3577.357068] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 3577.357069] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 3577.357070] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 3577.357070] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 3577.357071] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 3577.357072] amdgpu 0000:36:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ 3577.357073] amdgpu 0000:36:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 3577.357073] amdgpu 0000:36:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 [ 3577.357074] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 [ 3577.357075] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 [ 3577.357076] amdgpu 0000:36:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 [ 3579.943317] [drm:psp_hw_start [amdgpu]] *ERROR* PSP load kdb failed! [ 3579.943490] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 3579.943640] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block failed -62 [ 3579.943783] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). [ 3579.943784] amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62 [ 3579.943795] amdgpu 0000:03:00.0: PM: failed to resume async: error -62 [ 3579.947575] OOM killer enabled. [ 3579.947579] Restarting tasks ... done. [ 3579.950001] random: crng reseeded on system resumption [ 3580.262595] PM: suspend exit ```
Here is the dmesg output for `sudo pm-suspend` ``` 4858.351615] [ 4858.351615] ---[ end trace 0000000000000000 ]--- [ 4858.581783] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) [ 4858.581956] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed [ 4858.803973] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx [ 4858.804214] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:7 param:0x00000000 message:DisableAllSmuFeatures? [ 4858.804217] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 4858.804220] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 4858.804220] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -121 [ 4858.804366] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:46 param:0x00000000 message:PrepareMp1ForUnload? [ 4858.804367] amdgpu 0000:03:00.0: amdgpu: [PrepareMp1] Failed! [ 4858.804368] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* SMC failed to set mp1 state 2, -121 [ 4866.069225] [drm] PSP is resuming... [ 4866.300264] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed! [ 4866.300404] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 4866.300503] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block failed -62 [ 4866.300592] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). [ 4909.124991] PM: suspend entry (deep) [ 4909.126770] Filesystems sync: 0.001 seconds [ 4909.393629] Freezing user space processes [ 4909.427025] Freezing user space processes completed (elapsed 0.033 seconds) [ 4909.427038] OOM killer disabled. [ 4909.427040] Freezing remaining freezable tasks [ 4909.428163] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 4909.428241] printk: Suspending console(s) (use no_console_suspend to debug) [ 4909.428713] wlp4s0: deauthenticating from 30:cc:21:e3:6c:62 by local choice (Reason: 3=DEAUTH_LEAVING) [ 4909.657909] r8169 0000:05:00.0 enp5s0: Link is Down [ 4910.038318] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate [ 4910.038329] ------------[ cut here ]------------ [ 4910.038330] WARNING: CPU: 12 PID: 49361 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:2489 dm_suspend+0x1a6/0x1c0 [amdgpu] [ 4910.038554] Modules linked in: snd_seq_dummy snd_seq snd_seq_device ccm af_packet uinput cmac algif_hash algif_skcipher af_alg bnep amdgpu msr mt7921e mt7921_common mt76_connac_lib snd_hda_codec_realtek mt76 snd_hda_codec_generic mousedev mac80211 ledtrig_audio snd_hda_codec_hdmi iommu_v2 gpu_sched snd_hda_intel drm_buddy drm_ttm_helper snd_intel_dspcfg ttm btusb snd_intel_sdw_acpi edac_mce_amd btrtl snd_hda_codec drm_display_helper edac_core intel_rapl_msr btbcm intel_rapl_common btintel crc32_pclmul btmtk polyval_clmulni hid_multitouch drm_kms_helper r8169 snd_hda_core polyval_generic gf128mul nls_iso8859_1 ghash_clmulni_intel wmi_bmof bluetooth cfg80211 agpgart sha512_ssse3 ideapad_laptop realtek snd_hwdep sha512_generic sparse_keymap nls_cp437 snd_pcm aesni_intel i2c_algo_bit ucsi_acpi wdat_wdt syscopyarea platform_profile typec_ucsi sp5100_tco vfat ecdh_generic crypto_simd mdio_devres fat snd_timer rfkill cryptd snd ecc typec rapl sysfillrect watchdog tpm_crb video k10temp [ 4910.038592] i2c_piix4 libphy libaes cdc_acm sysimgblt soundcore libarc4 thermal roles battery wmi i2c_hid_acpi ip6_tables tpm_tis i2c_hid tpm_tis_core joydev xt_conntrack tiny_power_button nf_conntrack i2c_designware_platform acpi_cpufreq evdev i2c_designware_core acpi_tad nf_defrag_ipv6 input_leds button ac led_class nf_defrag_ipv4 mac_hid serio_raw ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables sch_fq_codel libcrc32c nfnetlink ctr loop cpufreq_ondemand tun tap macvlan bridge stp llc kvm_amd ccp kvm drm irqbypass fuse backlight deflate i2c_core efi_pstore configfs efivarfs tpm rng_core dmi_sysfs ip_tables x_tables autofs4 hid_generic ext4 usbhid hid crc32c_generic crc16 mbcache jbd2 xhci_pci xhci_pci_renesas xhci_hcd nvme thunderbolt nvme_core atkbd usbcore libps2 vivaldi_fmap t10_pi crc32c_intel crc64_rocksoft crc64 crc_t10dif i8042 usb_common crct10dif_generic crct10dif_pclmul crct10dif_common rtc_cmos serio dm_mod dax [ 4910.038636] CPU: 12 PID: 49361 Comm: kworker/u32:31 Tainted: G W 6.2.10 #1-NixOS [ 4910.038638] Hardware name: LENOVO 82UH/LNVNB161216, BIOS K9CN34WW 07/22/2022 [ 4910.038640] Workqueue: events_unbound async_run_entry_fn [ 4910.038644] RIP: 0010:dm_suspend+0x1a6/0x1c0 [amdgpu] [ 4910.038844] Code: 4c 89 e7 e8 fc eb 1f 00 48 89 ef e8 34 db 00 00 4c 89 f7 e8 0c bb ff ff e9 f0 fe ff ff 4c 89 e6 4c 89 ef e8 ac 15 20 00 eb d6 <0f> 0b e9 9b fe ff ff e8 9e 25 23 e6 66 66 2e 0f 1f 84 00 00 00 00 [ 4910.038845] RSP: 0018:ffffa8c94b3dfd10 EFLAGS: 00010282 [ 4910.038847] RAX: 0000000000000000 RBX: ffff8b198d218980 RCX: 0000000000000000 [ 4910.038848] RDX: 0000000000000000 RSI: ffffffffa816606a RDI: ffff8b198d200000 [ 4910.038848] RBP: ffff8b198d200000 R08: ffffffffa8857120 R09: 00000000a91c4786 [ 4910.038849] R10: ffffffffffffffff R11: ffffffffa8913138 R12: ffff8b198d200000 [ 4910.038850] R13: 0000000000000001 R14: ffff8b198d2169f8 R15: ffff8b198bdd00e8 [ 4910.038851] FS: 0000000000000000(0000) GS:ffff8b1cae900000(0000) knlGS:0000000000000000 [ 4910.038852] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4910.038853] CR2: 00007f666627238c CR3: 000000019a410000 CR4: 0000000000750ee0 [ 4910.038854] PKRU: 55555554 [ 4910.038854] Call Trace: [ 4910.038856] [ 4910.038861] amdgpu_device_ip_suspend_phase1+0x71/0xe0 [amdgpu] [ 4910.038991] amdgpu_device_suspend+0xc7/0x180 [amdgpu] [ 4910.039120] pci_pm_suspend+0x77/0x160 [ 4910.039124] ? __pfx_pci_pm_suspend+0x10/0x10 [ 4910.039125] dpm_run_callback+0x8c/0x1e0 [ 4910.039129] __device_suspend+0xf1/0x4f0 [ 4910.039131] async_suspend+0x1e/0x70 [ 4910.039132] async_run_entry_fn+0x34/0x130 [ 4910.039135] process_one_work+0x1c8/0x3c0 [ 4910.039138] worker_thread+0x51/0x390 [ 4910.039141] ? __pfx_worker_thread+0x10/0x10 [ 4910.039143] kthread+0xed/0x120 [ 4910.039145] ? __pfx_kthread+0x10/0x10 [ 4910.039147] ret_from_fork+0x2c/0x50 [ 4910.039151] [ 4910.039151] ---[ end trace 0000000000000000 ]--- [ 4910.354499] amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) [ 4910.354636] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed [ 4910.628795] [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx [ 4910.629023] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:7 param:0x00000000 message:DisableAllSmuFeatures? [ 4910.629026] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features. [ 4910.629029] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features! [ 4910.629030] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -121 [ 4910.629267] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 000000004f562829; ring_buffer_end = 00000000ce6a8326; write_frame = 0000000037103cdd [ 4910.629412] [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds [ 4910.629553] [drm:psp_suspend [amdgpu]] *ERROR* Failed to terminate tmr [ 4910.629693] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block failed -22 [ 4911.502922] ACPI: EC: interrupt blocked [ 4911.628687] nvme 0000:06:00.0: VC buffer not found in pci_save_vc_state [ 4918.043441] ACPI: PM: Preparing to enter system sleep state S3 [ 4918.150834] ACPI: EC: event blocked [ 4918.150837] ACPI: EC: EC stopped [ 4918.150838] ACPI: PM: Saving platform NVS memory [ 4918.152724] Disabling non-boot CPUs ... [ 4918.155456] smpboot: CPU 1 is now offline [ 4918.158258] smpboot: CPU 2 is now offline [ 4918.160781] smpboot: CPU 3 is now offline [ 4918.162921] smpboot: CPU 4 is now offline [ 4918.165054] smpboot: CPU 5 is now offline [ 4918.167262] smpboot: CPU 6 is now offline [ 4918.169317] smpboot: CPU 7 is now offline [ 4918.171267] smpboot: CPU 8 is now offline [ 4918.173377] smpboot: CPU 9 is now offline [ 4918.175415] smpboot: CPU 10 is now offline [ 4918.177343] smpboot: CPU 11 is now offline [ 4918.179173] smpboot: CPU 12 is now offline [ 4918.180938] smpboot: CPU 13 is now offline [ 4918.182786] smpboot: CPU 14 is now offline [ 4918.183156] Spectre V2 : Update user space SMT mitigation: STIBP off [ 4918.184414] smpboot: CPU 15 is now offline [ 4918.185152] ACPI: PM: Low-level resume complete [ 4918.185180] ACPI: EC: EC started [ 4918.185181] ACPI: PM: Restoring platform NVS memory [ 4918.186189] AMD-Vi: Virtual APIC enabled [ 4918.186293] AMD-Vi: Virtual APIC enabled [ 4918.186323] LVT offset 0 assigned for vector 0x400 [ 4918.186824] Enabling non-boot CPUs ... [ 4918.186860] x86: Booting SMP configuration: [ 4918.186861] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 4918.189459] ACPI: \_SB_.PLTF.C001: Found 3 idle states [ 4918.189688] Spectre V2 : Update user space SMT mitigation: STIBP always-on [ 4918.189695] CPU1 is up [ 4918.189717] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 4918.192034] ACPI: \_SB_.PLTF.C002: Found 3 idle states [ 4918.192261] CPU2 is up [ 4918.192277] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 4918.194641] ACPI: \_SB_.PLTF.C003: Found 3 idle states [ 4918.195010] CPU3 is up [ 4918.195025] smpboot: Booting Node 0 Processor 4 APIC 0x4 [ 4918.197823] ACPI: \_SB_.PLTF.C004: Found 3 idle states [ 4918.198133] CPU4 is up [ 4918.198147] smpboot: Booting Node 0 Processor 5 APIC 0x5 [ 4918.200430] ACPI: \_SB_.PLTF.C005: Found 3 idle states [ 4918.200880] CPU5 is up [ 4918.200897] smpboot: Booting Node 0 Processor 6 APIC 0x6 [ 4918.203245] ACPI: \_SB_.PLTF.C006: Found 3 idle states [ 4918.203618] CPU6 is up [ 4918.203631] smpboot: Booting Node 0 Processor 7 APIC 0x7 [ 4918.205931] ACPI: \_SB_.PLTF.C007: Found 3 idle states [ 4918.206480] CPU7 is up [ 4918.206492] smpboot: Booting Node 0 Processor 8 APIC 0x8 [ 4918.208849] ACPI: \_SB_.PLTF.C008: Found 3 idle states [ 4918.209310] CPU8 is up [ 4918.209324] smpboot: Booting Node 0 Processor 9 APIC 0x9 [ 4918.211627] ACPI: \_SB_.PLTF.C009: Found 3 idle states [ 4918.212275] CPU9 is up [ 4918.212288] smpboot: Booting Node 0 Processor 10 APIC 0xa [ 4918.214647] ACPI: \_SB_.PLTF.C00A: Found 3 idle states [ 4918.215171] CPU10 is up [ 4918.215186] smpboot: Booting Node 0 Processor 11 APIC 0xb [ 4918.217487] ACPI: \_SB_.PLTF.C00B: Found 3 idle states [ 4918.218262] CPU11 is up [ 4918.218275] smpboot: Booting Node 0 Processor 12 APIC 0xc [ 4918.220641] ACPI: \_SB_.PLTF.C00C: Found 3 idle states [ 4918.221244] CPU12 is up [ 4918.221257] smpboot: Booting Node 0 Processor 13 APIC 0xd [ 4918.223568] ACPI: \_SB_.PLTF.C00D: Found 3 idle states [ 4918.224434] CPU13 is up [ 4918.224447] smpboot: Booting Node 0 Processor 14 APIC 0xe [ 4918.226829] ACPI: \_SB_.PLTF.C00E: Found 3 idle states [ 4918.227526] CPU14 is up [ 4918.227539] smpboot: Booting Node 0 Processor 15 APIC 0xf [ 4918.229872] ACPI: \_SB_.PLTF.C00F: Found 3 idle states [ 4918.230820] CPU15 is up [ 4918.233478] ACPI: PM: Waking up from system sleep state S3 [ 4918.235838] ACPI: EC: interrupt unblocked [ 4918.568715] ACPI: EC: event unblocked [ 4918.569313] [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000). [ 4918.569336] [drm] PSP is resuming... [ 4918.572722] [drm] PCIE GART of 512M enabled (table at 0x00000082FEB00000). [ 4918.572744] [drm] PSP is resuming... [ 4918.586203] nvme nvme0: Shutdown timeout set to 10 seconds [ 4918.589687] nvme nvme0: 16/0/0 default/read/poll queues [ 4918.591528] [drm] reserve 0xa00000 from 0xf41e000000 for PSP TMR [ 4918.736378] r8169 0000:05:00.0 enp5s0: Link is Down [ 4918.813928] [drm:psp_hw_start [amdgpu]] *ERROR* PSP create ring failed! [ 4918.814095] [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed [ 4918.814234] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block failed -62 [ 4918.814356] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). [ 4918.814358] amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62 [ 4918.814364] amdgpu 0000:03:00.0: PM: failed to resume async: error -62 [ 4918.841629] amdgpu 0000:36:00.0: amdgpu: RAS: optional ras ta ucode is not available [ 4918.852220] amdgpu 0000:36:00.0: amdgpu: RAP: optional rap ta ucode is not available [ 4918.852222] amdgpu 0000:36:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available [ 4918.852225] amdgpu 0000:36:00.0: amdgpu: SMU is resuming... [ 4918.855407] amdgpu 0000:36:00.0: amdgpu: SMU is resumed successfully! [ 4918.857102] [drm] DMUB hardware initialized: version=0x0400002E [ 4918.919248] snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535 [ 4919.723574] [drm] kiq ring mec 2 pipe 1 q 0 [ 4919.727297] [drm] VCN decode and encode initialized successfully(under DPG Mode). [ 4919.727803] [drm] JPEG decode initialized successfully. [ 4919.727813] amdgpu 0000:36:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 [ 4919.727816] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 [ 4919.727818] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 [ 4919.727819] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0 [ 4919.727820] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0 [ 4919.727821] amdgpu 0000:36:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0 [ 4919.727821] amdgpu 0000:36:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0 [ 4919.727822] amdgpu 0000:36:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0 [ 4919.727823] amdgpu 0000:36:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0 [ 4919.727824] amdgpu 0000:36:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 [ 4919.727825] amdgpu 0000:36:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 [ 4919.727825] amdgpu 0000:36:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1 [ 4919.727826] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1 [ 4919.727827] amdgpu 0000:36:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1 [ 4919.727828] amdgpu 0000:36:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1 [ 4919.739549] OOM killer enabled. [ 4919.739552] Restarting tasks ... done. [ 4919.741941] random: crng reseeded on system resumption [ 4919.890245] PM: suspend exit ```
lorenz commented 1 year ago

@utkarshgupta137 That's an unrelated kernel issue, please report it to https://gitlab.freedesktop.org/drm/amd

Atemu commented 1 year ago

I'm inclined to close this issue since there is nothing we can really do about upstream issues. If it turns out @lorenz' workaround is effective at mitigating this issue, you can create a feature request for implementing it.

utkarshgupta137 commented 1 year ago

I have a similar issue, which might be related: My laptop with Ryzen 6800H & RX 6850M XT fails to suspend. After I run systemctl suspend or sudo systemctl suspend, my laptop screen goes blank after 0.5 seconds, then it wakes up automatically after 5 seconds & then it goes blank again after 5 seconds. But it still doesn't enter suspend (the power button & other LEDs remain solid instead of blinking). Curiously, sudo pm-suspend works without any issues. Even more curiously, PM_DEBUG=true sudo pm-suspend has the same behavior as systemctl suspend. Here is the dmesg output for systemctl suspend Here is the dmesg output for sudo pm-suspend

My issue was related to deep sleep in /sys/power/mem_sleep. My laptop didn't have deep sleep option by default, but I was able to enable it using UMAF. Disabling it again solved the problem.

davidak commented 1 year ago

I had the issue again today on NixOS 22.11.4479.d4a9ff82fc1 with Linux 6.3.5 and created a bugreport upstream: https://gitlab.freedesktop.org/drm/amd/-/issues/2635

davidak commented 6 months ago

Still happens with Linux 6.7.6. New upstream issue: https://gitlab.freedesktop.org/drm/amd/-/issues/3208 I configured the memreserver and see how well it works.

infinisil commented 3 weeks ago

@lorenz Thank you for your workaround, it's working very well for me so far!

lorenz commented 3 weeks ago

@infinisil I'm glad it works well for you too, I should probably take some time to clean up/finish https://github.com/NixOS/nixpkgs/pull/225819 so it can be used out-of-the-box :)