GPUOpen-Drivers / AMDVLK

AMD Open Source Driver For Vulkan
MIT License
1.7k stars 160 forks source link

Random "illegal register access" GPU crash while playing Satisfactory: #280

Open RenaKunisaki opened 2 years ago

RenaKunisaki commented 2 years ago

I get this crash randomly while playing Satisfactory:

Jul  3 11:58:08 guilmon kernel: [drm:gfx_v8_0_priv_reg_irq [amdgpu]] *ERROR* Illegal register access in command stream
Jul  3 11:58:08 guilmon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=387961, emitted seq=387963
Jul  3 11:58:08 guilmon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process FactoryGame-Win pid 4312 thread FactoryGame-Win pid 4312
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: GPU reset begin!
Jul  3 11:58:08 guilmon kernel: clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 121244ns
Jul  3 11:58:08 guilmon kernel: clocksource: wd-tsc-wd read-back delay of 121733ns, clock-skew test skipped!
Jul  3 11:58:08 guilmon kernel: perf: interrupt took too long (4629 > 4483), lowering kernel.perf_event_max_sample_rate to 43200
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: snd_hda_intel 0000:00:14.2: IRQ timing workaround is activated for card #0. Suggest a bigger bdl_pos_adj.
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: \x0alast message was failed ret is 65535
Jul  3 11:58:08 guilmon kernel: amdgpu: Failed to force to switch arbf0!
Jul  3 11:58:08 guilmon kernel: amdgpu: [disable_dpm_tasks] Failed to disable DPM!
Jul  3 11:58:08 guilmon kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <powerplay> failed -22
Jul  3 11:58:08 guilmon kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jul  3 11:58:08 guilmon kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jul  3 11:58:09 guilmon kernel: amdgpu: cp is busy, skip halt cp
Jul  3 11:58:09 guilmon kernel: amdgpu: rlc is busy, skip halt rlc
Jul  3 11:58:09 guilmon kernel: CPU: 7 PID: 120 Comm: kworker/u16:6 Tainted: G           OE     5.18.5-artix1-1 #1 d915910a51fd3468d4148b188d07d15cfc34c35a
Jul  3 11:58:09 guilmon kernel: Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A99FX PRO R2.0, BIOS 2501 04/07/2014
Jul  3 11:58:09 guilmon kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
Jul  3 11:58:09 guilmon kernel: Call Trace:
Jul  3 11:58:09 guilmon kernel:  <TASK>
Jul  3 11:58:09 guilmon kernel:  dump_stack_lvl+0x48/0x5d
Jul  3 11:58:09 guilmon kernel:  amdgpu_do_asic_reset+0x2a/0x470 [amdgpu 44b7b22559c0805dcd17429d634a6646a1917733]
Jul  3 11:58:09 guilmon kernel:  amdgpu_device_gpu_recover_imp.cold+0x537/0x8cc [amdgpu 44b7b22559c0805dcd17429d634a6646a1917733]
Jul  3 11:58:09 guilmon kernel:  amdgpu_job_timedout+0x18c/0x1c0 [amdgpu 44b7b22559c0805dcd17429d634a6646a1917733]
Jul  3 11:58:09 guilmon kernel:  drm_sched_job_timedout+0x76/0x100 [gpu_sched ac0b937704053fac20da4213593e0fdd3b2a6d2a]
Jul  3 11:58:09 guilmon kernel:  process_one_work+0x1c7/0x380
Jul  3 11:58:09 guilmon kernel:  worker_thread+0x51/0x380
Jul  3 11:58:09 guilmon kernel:  ? rescuer_thread+0x3a0/0x3a0
Jul  3 11:58:09 guilmon kernel:  kthread+0xde/0x110
Jul  3 11:58:09 guilmon kernel:  ? kthread_complete_and_exit+0x20/0x20
Jul  3 11:58:09 guilmon kernel:  ret_from_fork+0x22/0x30
Jul  3 11:58:09 guilmon kernel:  </TASK>
Jul  3 11:58:09 guilmon kernel: amdgpu 0000:01:00.0: amdgpu: BACO reset

The screens all go blank. Trying to perform GPU recovery hangs; only solution is magic sysrq reboot.

Before installing amdvlk, there was no stability problem (but the game had to use DX11 emulation that performs badly).

OS: Artix AMD64 GPU: RX590 RAM: 32GB (about 8 used when gaming) CPU: AMD FX-8320E

The most likely situation to crash is just starting the game, but it also happens at random while playing. Sometimes it'll work for 6+ hours with no problem, others it'll crash 10 minutes in.

jinjianrong commented 1 month ago

Reproduced with the amdvlk-pro driver but not with amdvlk open source driver