M-Bab / linux-kernel-amdgpu-binaries

Kernel binaries (amd64) of amd-staging with DAL and latest security patches
214 stars 29 forks source link

AMD FirePro W5130M Crashes with kernel driver module amdgpu under full performance mode #87

Closed Davidian1024 closed 4 years ago

Davidian1024 commented 4 years ago

Hello,

I have a laptop with an AMD FirePro W5130M GPU in it that suffers from system/application crashes/freezes when this GPU is placed into full performance mode using the amdgpu kernel driver module.

I think this GPU isn't recognized correctly by the amdgpu driver and the driver tries to run it with the wrong settings. I have only ever been able to run it in full performance mode using the Ubuntu OS that it came with that was customized by Dell. But that OS is based on Ubuntu 14.04 and it was configured with fglrx.

Support for my type of GPU seems to be only experimental with amdgpu, but I think my only chance of getting it to run stable with a modern Linux OS is with amdgpu.

Thanks, Dave

mezcalbert commented 4 years ago

It's a GCN 1.0/Southern Islands/CapeVerde/Tropo LE card. https://www.techpowerup.com/gpu-specs/firepro-w5130m.c2769

Have you activated the amdgpu experimental support of GCN 1.0/Southern Islands GPU?

If not, you need to append the following to your kernel parameters in /etc/default/grub (then update-grub).

radeon.si_support=0 amdgpu.si_support=1

https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter https://wiki.archlinux.org/index.php/AMDGPU

Don't forget to remove these kernel parameters if you want to use the radeon driver again.

Be aware that in any case the experimental GCN 1.0 support in amdgpu might be dropped. https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Might-Drop-GCN-1.0

Since GCN 1.0 is not officially supported by amdgpu, the radeon driver should be default. I guess you have already tried that. Is it not working either?

I'm sorry if you have already tried all of these. It's good to start with the basics, you never know.

Davidian1024 commented 4 years ago

Have you activated the amdgpu experimental support of GCN 1.0/Southern Islands GPU?

I have. Here's the most recent set of options I tried when loading the amdgpu kernel driver module:

sudo modprobe -v amdgpu si_support=1 dpm=1 vm_fault_stop=2 vm_debug=1 gpu_recovery=1 ppfeaturemask=0xffffffff exp_hw_support=1

Now, I admit some of these options are complete guesses. Over the time I've been trying to get this card to run using the amdgpu driver I've lost track of every combination I've gone through.

I typically run this laptop with both the amdgpu and radeon modules blacklisted in the kernel command line. Then when I'm in the mood to try to fight with this problem I'll load the module with a modprobe command like the one above. One nice thing about doing it this way is I can start dmesg -w and watch the output while the module loads and when I start a 3D application, etc.

I did see mentions that GCN 1.0 is experimental with the amdgpu driver. I'm sort of hoping that maybe I can help to get it improved, at least for this one GPU.

And yes, I've also tried the radeon driver and I get similar results. The biggest difference is the way the messages appear in the dmesg output when the freeze is limited to the 3D application but I can still see that terminal.

No worries on making suggestions I may have already tried. This makes me feel like I've at least been trying things that make some sense.

M-Bab commented 4 years ago

In this special kernel the radeon module is actually disabled. This increases the chance that the amdgpu module is used - so I am not sure if the kernel parameters are needed.

Davidian1024 commented 4 years ago

Any suggestions as to what I ought to do? I can add some additional info. It seems to me that the card is being treated as if it is a different but similar card. I've wondered if the driver is simply attempting to run it with incorrect settings.

The reason I sometimes think this is the way the card is reported by lspci 01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde PRO / Venus LE / Tropo PRO-L [Radeon HD 8830M / R7 250 / R7 M465X] (rev 87)

This has caused me a great deal of confusion for a long time in troubleshooting this issue. And I'm not sure I've sorted this bit out.

mezcalbert commented 4 years ago

Some hope for you:

https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-SI-DC-Display-V4

Have you tried to install a more recent kernel, not the one here (sorry*) with only those 2 parameters instead of the many you have?

radeon.si_support=0 amdgpu.si_support=1

You might also want to check out when the patches from the article land and try again afterwards to see if you get some improvements.

'* I used this kernel around 4.10-4.15 with the GCN 1.1 part of my Godavari APU (with above kernel parameters for cik rather si) then the RX 560 Polaris, so it served me well before kernel 4.17-4.18-4.19 got the support in a state good enough to stick with mainline kernels. Thanks again for the work you did at the time.

Davidian1024 commented 4 years ago

That is interesting and does make me hopeful. Thank you.

Is this saying they are just now trying to get support for my generation of card (GCN 1.0) through the amdgpu driver into the kernel?

I did try enabling display core with the dc=1 option first while still running on the 5.6.19-20.07.09.amdgpu.ubuntu kernel.

I get messages like this in the dmesg output:

[ 905.820724] [drm] Display Core has been requested via kernel parameter but isn't supported by ASIC, ignoring

So I did try installing a couple new kernels from kernel.org. 5.7.9-050709-lowlatency and 5.8.0-050800rc5-lowlatency. I picked the low latency ones because that sounded good. I think I would prefer low latency.

So when I load the module I still see those not supported messages. I'm wondering if this means I'll have to wait and hope that the proof of concept DC support for GCN 1.0 cards makes it's way into the kernel.

And unfortunately I still get the same problems when I try to enable high performance. Whenever the problems occur it's somewhat unpredictable what will happen. At one point everything froze and I had to kill the power. Another time it only froze the 3D application, but I was able to kill it and reboot to get back to stable state.

I wonder if it would make sense for me to reach out to this Mauro Rossi who submitted the mentioned patches.

M-Bab commented 4 years ago

Worth another try with the new kernel because I could enable a new kernel option CONFIG_DRM_AMD_DC_SI. I am not too optimistic though because the FirePro W5130M is a professional card.