FreeBSDDesktop / kms-drm

the DRM part of the linuxkpi-based KMS
63 stars 26 forks source link

panic loading amdgpu #255

Open rkitover opened 3 years ago

rkitover commented 3 years ago

I get this in /var/log/messages:

Dec 24 01:14:03 epycfbsd kernel: drmn1: failed to link firmware kernel module with mapped name: amdgpu_navi10_gpu_info_bin
Dec 24 01:14:03 epycfbsd kernel: amdgpu/navi10_gpu_info.bin: could not load firmware image, error 2
Dec 24 01:14:03 epycfbsd syslogd: last message repeated 1 times
Dec 24 01:14:03 epycfbsd kernel: drmn1: failed to load firmware with name: amdgpu/navi10_gpu_info.bin
Dec 24 01:14:03 epycfbsd kernel: drmn1: Failed to load gpu_info firmware "amdgpu/navi10_gpu_info.bin"
Dec 24 01:14:03 epycfbsd kernel: drmn1: Fatal error during GPU init
Dec 24 01:14:03 epycfbsd kernel: [drm] amdgpu: finishing device.
Dec 24 01:14:03 epycfbsd kernel: Warning: can't remove non-dynamic nodes (dri)!

I am using latest git current with latest git drm-devel-kmod and gpu-firmware-kmod from ports.

My card is a 5700xt.

Any help much appreciated.

valpackett commented 3 years ago

Well do you have the firmware installed? It's packaged as gpu-firmware-kmod.

rkitover commented 3 years ago

Yes I have the latest drm-devel-kmod and gpu-firmware-kmod installed from the ports git.

valpackett commented 3 years ago

Weird. Can you manually kldload amdgpu_navi10_gpu_info_bin?

rkitover commented 3 years ago

I will try right now.

rkitover commented 3 years ago

So that loads fine, but it seems that amdgpu dies with a backtrace which I need to get with hw.syscons.disable=1 which might be tricky.

valpackett commented 3 years ago

Drop the syscons.disable and just make sure the loader is not in the highest/"native" resolution (check with the gop command in the loader prompt) (set efi_max_resolution="1080p" or efi_max_resolution="720p" or whatever would make it smaller inloader.conf`), there should be no framebuffer problems in that case

rkitover commented 3 years ago

Nice, thank you, I will try that.

rkitover commented 3 years ago

I set the resolution to 1440p and the framebuffer started at 1600x1200.

I ran:

kldload amdgpu_navi10_gpu_info_bin
kldload amdgpu

I got this backtrace:

backtrace

Here is my system information:

Motherboard: Supermicro H11DSi CPUs: 2x 32 core first gen AMD epyc RAM: 128gb ECC GPUs: 2x 5700xt FreeBSD: latest current git Packages: latest from pkg Ports: latest git versions of drm-devel-kmod and gpu-firmware-kmod

If you would like to debug this, I'll be happy to do whatever is needed.

valpackett commented 3 years ago

2x 5700xt

huh. well the panic is that the driver is trying to create /dev/dri/renderD128 twice (error code 17 is EEXIST). Looks like the first GPU failed to attach for some actual, more serious reason (drmn0 attach returned 2 at the very top line), and we don't clean everything up in that case, so the second GPU fails with that.

Can you scroll up (with Scroll Lock) in the console to see what's up with the first GPU?

rkitover commented 3 years ago

This time I got somewhat different behavior, I ran:

kldload amdgpu

and it initialized the first card and loaded the firmware, I could see the firmware modules in kldstat. However, it failed to initialize the second card.

Here is the first card being successfully initialized:

log

Here is the second card failing to initialize:

log

I then tried running xorg with amdgpu to see if it would start on the first card, but I got a panic:

panic

valpackett commented 3 years ago

hm, same trace as https://github.com/freebsd/drm-kmod/issues/36 with the vm_page_busy_acquire

Try building the newer driver from https://github.com/freebsd/drm-kmod/pull/40 (https://github.com/myfreeweb/drm-kmod/tree/5.5-wip-amd-pr)

rkitover commented 3 years ago

I built and installed that branch, and now I get this backtrace on kldload amdgpu, this is the first page:

page1 of backtrace

and this is the second page:

page 2 of backtrace

valpackett commented 3 years ago

welcome to the navi FPU kernel context issues suffering club :D (https://github.com/freebsd/drm-kmod/issues/42 etc)

Please try again (git pull to get the latest commit https://github.com/freebsd/drm-kmod/pull/40/commits/c4cc8385313833aeea34b702a719c2a1f819d40a)

rkitover commented 3 years ago

Thank you, that seems to have gotten further, but still panics:

page 1 of crashdump

page 2 of crashdump

Looks to be dying in the same function.

valpackett commented 3 years ago

Oh, right. *facepalm* Try again https://github.com/freebsd/drm-kmod/pull/40/commits/7693e3a492da031171f33cd2d239392a6ae861f1

rkitover commented 3 years ago

Just tried this, module loads and initializes the first GPU, fails to initialize the second GPU, then locks up hard when I start xorg.

console log page 1

console log page 2

valpackett commented 3 years ago

hm. Does it work with only one GPU installed?

rkitover commented 3 years ago

Will try today!

rkitover commented 3 years ago

Miraculously, it works. I am typing this in KDE on amdgpu now!

Some problems with KDE, but I'll try to work on that.

So what are the next steps here.

I can play with the code a bit if you tell me where to look and what to look at.

novolaska commented 3 years ago

Oh, right. facepalm Try again freebsd/drm-kmod@7693e3a

Works for Renoir 4750G.

dmesg.log Xorg.0.log

Update: drm-kmod source, https://github.com/unrelentingtech/drm-kmod/commits/5.5-wip-amd-pr (7693e3a492da031171f33cd2d239392a6ae861f1)

FreeBSD current, https://github.com/freebsd/freebsd-src, before commit 50180d2b52cc16ecb6a6617fdc53f5d83c71a8b4 (included), and patched with commit 9f47eeffa3cfdcb512e2011fb00fc23c7c1a7d75 for this issue.

rkitover commented 3 years ago

As a temporary measure, since I do want to put my second GPU back in, is there some way, perhaps in loader.conf, to disable my second GPU so that amdgpu does not try to initialize it?

valpackett commented 3 years ago

Maybe in /boot/device.hints https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/device-hints.html

Something like hint.drmn.1.disabled=1 or hint.drm.1.disabled=1 could work? (not sure what exactly the driver name would be)

rkitover commented 3 years ago

I will try, thank you very much.

rkitover commented 3 years ago

Before I put the second GPU back in, I determined that the correct loader.conf invocation is as you said:

hint.drmn.1.disabled=1

with the second GPU back in, it initializes the first GPU but panics when xorg is being started:

amdgpu panic

valpackett commented 3 years ago

Huh. So reproducibly, always when the second GPU is present (but not even initialized, no dmesg lines for it), there are vm_fault panics, but they don't happen without the second GPU? I guess something in our memory code doesn't handle multiple GPUs :/

rkitover commented 3 years ago

Well I could try playing with the code to at least get more information about this, do you have any tips for working with this codebase and debugging etc. and any specific places I should look at to start?

Also I wanted to say that I really appreciate all your help on this, can I buy you a case of beer?

valpackett commented 3 years ago

tips for working with this codebase and debugging etc

https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

any specific places I should look at

Well the functions in the backtrace here I guess..

(Really, first just confirm that this is reproducible, i.e. every time you have a second GPU this crash happens and every time you don't have it, it doesn't.)

can I buy you a case of beer?

I don't drink :P

rkitover commented 3 years ago

Sure that's easy enough to do, I can just unplug the power cable.

rkitover commented 3 years ago

I don't drink :P

I meant like, do you have a paypal or patreon or whatever link for sponsoring freebsd development, using your beverage of choice.

rkitover commented 3 years ago

@myfreeweb I have once again verified that this is the case. If I unplug the pcie power from my second GPU then I can start xorg on amdgpu.

Also I realized that I can just do this for now, unplug the pcie power when I want to boot FreeBSD. At least for now.

This is a huge improvement over my previous situation where my only choices where an NVIDIA GPU or scfb, thank you very much.

KDE does not seem to be working very well for me here, I'll do the necessary follow up work on that, but in the meantime if I can't fix it I"ll just install xfce so I have a working desktop and can start playing with FreeBSD as a daily driver.

Once I do that, I will look at this backtrace and see if I can do anything.