FreeBSDDesktop / DEPRECATED-freebsd-base-graphics

Fork of FreeBSD's base repository to work on graphics-stack-related projects
Other
49 stars 13 forks source link

amdgpu module crashes the kernel on R9 285 Tonga. #128

Closed weabot closed 7 years ago

weabot commented 7 years ago

Hello, I am on an amdgpu R9 285 (Tonga family) and loading the amdgpu.ko kernel module with kldload seems to crash the kernel.

To give some context, I installed the latest FreeBSD 12.0-CURRENT snapshot and installed the GENERIC_DRM-NODEBUG kernel cloned from the drm-next branch. I didn't modify the configuration. I loaded the drm module, then the amdgpu module. My GPU fan throttled for a second or so, then completely stopped. The screen went black.

Here are the logs. I'm guessing the small throttle had to do with powerplay starting, but it seems stuck in that loop where it fails to send messages to the card (I'm guessing). On my end the screen is black and the kernel seems unresponsive. Loading the module at boot from the loader results in a page fault (supervisor read data, page not present).

Thank you for your work! :)

nomadlogic commented 7 years ago

Hey there - I am not %100 certain as to the state of support for the Tonga GPU on drm-next, but I know there is def active work happening to support AMD GPU's better.

It sounds like you are just copying to kernel config into the stock 12-CURRENT tree you have locally. This won't work since there is quite a bit amount of code added to the drm-next branch that you'll need. You can follow the instructions here for reference to get your system built properly:

https://github.com/FreeBSDDesktop/freebsd-base-graphics/wiki#building-kernel-from-scratch

Basically you'll need to checkout the drm-next branch, then build a new world and kernel off of that repository. We have also added a change that requires you to have llvm40 installed. This will speed up your build by quite a bit since you will not need to build llvm40 during "buildworld".

weabot commented 7 years ago

I cloned the tree directly from github... I put it in /usr/drm instead of /usr/src, surely it can't be because of that?

iotamudelta commented 7 years ago

This seems to be a genuine startup problem, I also have a few still for discrete AMD cards. @markjdb probably would be the right contact to tell you what data is needed to debug.

weabot commented 7 years ago

However I didn't do build/install world because I thought the kernel would be enough. Let me try that and report back.

markjdb commented 7 years ago

Is this a regression?

iotamudelta commented 7 years ago

I doubt it. There were still lingering problems for a few discrete cards (including the Fury Nanos and S9150s I have access to). This is likely one of these cases.

weabot commented 7 years ago

No changes with the new world. Didn't think so either because that's definitely not a userspace issue but I'm just covering my bases.

You guys seem to have an idea what's wrong though.

nomadlogic commented 7 years ago

looking at your logfile from original message this looks potentially interesting:

Mar 15 12:34:03 TJULP kernel: [drm:gfx_v8_0_ring_test_ring] amdgpu: ring 0 test failed (scratch(0xC040)=0xCAFEDEAD)
Mar 15 12:34:03 TJULP kernel: [drm:amdgpu_init] hw_init of IP block <gfx_v8_0> failed -22
Mar 15 12:34:03 TJULP kernel: drmn0: amdgpu_init failed

I'll defer to others actually hacking on the code though...

freebsd-nils-level1 commented 7 years ago

Unfortunately same here for most of the time: amdgpu.polaris.crash.txt

Using a RX460.

Once, I was able to get "amdgpu.ko" loaded and running after several tries: amdgpu.polaris.loaded.txt - but I could not repeat it again until now...

BTW: 3D didn't work because somehow the PCI bus addresses got mixed up; DRI driver "radeonsi.so" tried to bind to "hostb9@pci0:0:24:2" although the RX460 is located at "vgapci0@pci0:36:0:0". Xorg it self identified the PCI bus addresses correctly for some reason...

freebsd-nils-level1 commented 7 years ago

Don't ask me how - but I've managed to get the module running again. I've tried four times and then the fifth time, I've switched to another VTY and executed "kldload amdgpu". As soon as I got my prompt back, I've executed "service startkde4 onestart".

The problem with the wrong PCI bus addresses is probably related to a potential "libdevq" bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=217886

Before I forget; thanks for all the hard work you guys are doing here. Big kudos for that...

weabot commented 7 years ago

What the hell this worked... I guess since Linux doesn't have tty0 it didn't know what to do? Something like that? Damn. I actually didn't expect this to work.

gjs278 commented 7 years ago

Hi guys, I was considering getting an RX460 and running it on FreeBSD. Does it work for you without issues currently? @nbe-renzel-net

iotamudelta commented 7 years ago

@gjs278 I think we had somebody running the RX460. It'll certainly have the same issues all amdgpu cards share at the moment: 3D is not feasible since the kernel leaks memory for it. This is one of the reasons that I have also not kept up with the latest Mesa updates and whether they even manage to load the required libraries properly. So, if 2D acceleration is enough for you: the card will likely work. 3D: not yet.

weabot commented 7 years ago

This is a duplicate of the issue in #158 as the solutions proposed here and in that thread both work and are different solutions of that issue.

markjdb commented 7 years ago

@gjs278 Sorry for the late reply, but I've been using an RX460 without issues for the past month or so.