clbr / radeontop

GNU General Public License v3.0
785 stars 69 forks source link

RX680M full system crash #149

Closed madushan1000 closed 1 year ago

madushan1000 commented 1 year ago

I'm not sure if this is a radeontop bug or kernel bug, but when I try to run radeontop on Ryzen 6900hs iGPU(RX680M), the system crashes completely and reboots every time. No logs survive the crash. The crash started happening after this commit https://github.com/clbr/radeontop/commit/e3bbf06eaed49746f2838a60eb01e7edfc185da5

The laptop is Asus g14 and it's on arch kernel 6.0.12, but I could reproduce it on earlier kernels too.

B83C commented 1 year ago

Reverting the commit fixed the crash for me on 6.1.0+ with Ryzen 6850u

danielzgtg commented 1 year ago

That is my commit. It shouldn't have enabled itself on 6900 as that's RDNA2.

What are the following?:

  1. What is the family name in the header at the top while running?
  2. What is your GPU's line from lspci -nn?
  3. What is the output of vainfo?
  4. Does it still crash if you completely delete the part enabling it?:

https://github.com/clbr/radeontop/blob/e3bbf06eaed49746f2838a60eb01e7edfc185da5/detect.c#L374-L379

madushan1000 commented 1 year ago
  1. if I revert the commit the family name is YELLOW_CARP. Otherwise it crashes.
  2. 07:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev c7)
  3. > vainfo
    Trying display: wayland
    vainfo: VA-API version: 1.17 (libva 2.17.1)
    vainfo: Driver version: Mesa Gallium driver 22.3.5 for AMD Radeon Graphics (rembrandt, LLVM 15.0.7, DRM 3.49, 6.1.12-arch1-1)
    vainfo: Supported profile and entrypoints
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileAV1Profile0            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
madushan1000 commented 1 year ago
  1. Yes, it crashes even if I comment out the section you mentioned
danielzgtg commented 1 year ago

The common denominator between you and other affected users seems to be laptop APUs. I don't have any laptops with AMD APUs so test on. The memory regions and registers might be different from my desktop GPUs that I tested on, causing the crashes.

It would be good to consider disabling my video encode/decode detection feature for all laptop APUs until we confirm the proper registers.

danielzgtg commented 1 year ago

Never mind my idea about laptop APUs. The actual problem was more serious in that I put the if statement around the display code but forgot the if statements when actually reading the memory. I fixed that in #152. Does that PR fix your crashes?

madushan1000 commented 1 year ago

Yes, this commit indeed fixed my crash, thank you for fixing it :)

madushan1000 commented 1 year ago

I there is something weird going on with the AMD gpu firmware though, I don't get how it's possible to crash the whole system from user space like this without any kernel logs or anything.