M-Bab / linux-kernel-amdgpu-binaries

Kernel binaries (amd64) of amd-staging with DAL and latest security patches
214 stars 29 forks source link

HSA exception: Queue create failed at hsaKmtCreateQueue #106

Closed WsqRichards closed 1 year ago

WsqRichards commented 1 year ago

ENV:

ubuntu20.04.3 AMD Radeon RX 7600

demo

run HIP-Examples/vectorAdd/

dmesg:

[420556.034467] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420556.034587] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420556.162589] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420556.162709] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420556.290666] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420556.290785] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420557.467386] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420557.467607] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420557.595802] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420557.595921] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420559.766915] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420559.767138] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420559.895337] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420559.895456] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420604.310149] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420604.310351] [drm:amdgpu_mes_set_shader_debugger [amdgpu]] ERROR failed to set_shader_debugger [420604.446861] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420604.446998] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420604.575715] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=14 [420604.575838] [drm:amdgpu_mes_reg_write_reg_wait [amdgpu]] ERROR failed to reg_write_reg_wait [420604.704500] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] ERROR MES failed to response msg=2 [420604.704625] amdgpu: failed to add hardware queue to MES, doorbell=0x1000 [420604.704627] amdgpu: MES might be in unrecoverable state, issue a GPU reset [420604.704651] amdgpu: Pasid 0x8003 DQM create queue type 0 failed. ret -110

Mr-Precise commented 1 year ago

As you can see, there have been no updates here for 2 years... But you can build the kernel yourself from here: https://gitlab.freedesktop.org/agd5f/linux/ Or I have ready-made kernel builds on ubuntu 20.04 for AMD video cards: Mr-Precise/linux-kernel-with-amdgpu-bin This is an experiment and I don't guarantee anything...

M-Bab commented 1 year ago

Yeah I stopped building and supporting these kernels here: https://github.com/M-Bab/linux-kernel-amdgpu-binaries/issues/23#issuecomment-1336514067

I am sorry. There were too many merge conflicts trying to build these.