Closed brainpower closed 6 years ago
@brainpower how would you feel about taking over the AUR package in case @corngood doesn't return?
I'm still watching this, I'm just not able to do any testing. If you guys test it and give me the thumbs up, I'll release it on AUR. I'm also happy to hand the package over to a new maintainer, but preferably it would be someone like @brainpower who has provided working MRs.
This PR doesn't even install because packages like libffi-dev do not even exist in Arch.
@svenstaro The "libffi-dev" is a Debian / Ubuntu convention for the header files for "libffi". If "libffi" exists on the Arch system the headers should be there as well; Arch packages include the header files.
I have an AMD GPU - I can test this as long as it doesn't break my OpenCL. I'll fire up an Arch virtual machine and see what happens. ;-)
@znmeb Yes indeed, but the problem is that this package requires Debian-ism packages and not those found in Arch.
I pushed a commit dealing with libffi-dev and libtinfo-dev, please report if there are any other problems. I'm currently not at a machine with an AMD graphics card, so I can't really test.
About maintaining: I probably could do it, but I'd rather not, because the time I can spend on this is rather limited and I couldn't promise to react on any problems or updates in as a timely manner as I'd like to. Not being able to squeeze in some testing of the packages for several weeks proves that point.
So I'll say again, please test this PR and report any issues. Most of those are easily fixed in a few minutes, which I can squeeze in more easily than a few hours of testing.
@brainpower thanks for the change but libedit2 is also not available in Arch.
Also libpci3.
I wonder if we can update the travis test to install the packages and catch this sort of thing. @svenstaro could you paste the pacman command you're using to install?
I have an AMD GPU - let me know when it can build and I'll test it! I want this rather badly since the "supported" AMD code for Ubuntu doesn't work with 16.04.3 LTS! http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-Compatibility-Advisory-with-Ubuntu-16.04.2-and-16.04.3.aspx
Using pacman -U amdgpu*.pkg.tar.xz
and getting:
looking for conflicting packages...
:: amdgpu-pro-libdrm and libdrm are in conflict. Remove libdrm? [y/N] y
:: amdgpu-pro-libgl and libglvnd are in conflict (libgl). Remove libglvnd? [y/N] y
error: failed to prepare transaction (could not satisfy dependencies)
:: lib32-libglvnd: removing libglvnd breaks dependency 'libglvnd'
:: mesa: removing libglvnd breaks dependency 'libglvnd'
@corngood : Well, adding "-i" to the makepkg call should cause the packages to be installed... But we'd need to install any required packages from AUR beforehand, because pacman will fail to install those...
@svenstaro you need mesa-noglvnd, I think, the amdgpu-pro libgl is not glvnd compatible.
If I'm reading the AMD page correctly, it also requires X <= 1.18 and a kernel <= 4.9. Although I tried the Ubuntu package with Ubuntu 16.04.2 and it didn't work - black-screened.
error: failed to commit transaction (conflicting files)
/etc/amd/amdrc exists in both 'amdgpu-pro-libgl' and 'lib32-amdgpu-pro-libgl'
Alright, trying to run this with linux-lts:
[ 2.291647] Error: fail to get symbol drm_gem_prime_dmabuf_ops
[ 2.292189] ------------[ cut here ]------------
[ 2.292708] kernel BUG at /var/lib/dkms/amdgpu-17.40/492261/build/amd/amdkcl/kcl_common.h:34!
[ 2.293245] invalid opcode: 0000 [#1] SMP
[ 2.293791] Modules linked in: amdkcl(O+) snd_hda_intel(+) drm_kms_helper snd_hda_codec evdev input_leds joydev drm snd_ctxfi(+) snd_hda_core led_class mousedev mac_hid snd_hwdep snd_pcm syscopyarea snd_timer i2c_i801(+) sysfillrect snd r8169 sysimgblt i2c_smbus fb_sys_fops i2c_algo_bit soundcore mii mei_me(+) mei shpchp fan(+) thermal wmi hci_uart btbcm btqca btintel bluetooth parport_pc(+) parport battery rfkill video i2c_hid intel_lpss_acpi intel_lpss pcc_cpufreq(-) acpi_als tpm_infineon kfifo_buf fjes acpi_pad tpm_tis tpm_tis_core industrialio button tpm sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache hid_generic usbhid hid crc32c_intel ahci libahci xhci_pci nvme xhci_hcd nvme_core libata usbcore scsi_mod usb_common i8042 serio
[ 2.293813] CPU: 0 PID: 221 Comm: systemd-udevd Tainted: G O 4.9.66-1-lts #1
[ 2.293813] Hardware name: Gigabyte Technology Co., Ltd. B150M-HD3/B150M-HD3-CF, BIOS F22a 07/04/2017
[ 2.293814] task: ffff8808174aac40 task.stack: ffffc900039bc000
[ 2.293815] RIP: 0010:[<ffffffffa065b690>]
[ 2.293819] [<ffffffffa065b690>] amdkcl_drm_init+0x2c0/0x2e0 [amdkcl]
[ 2.293820] RSP: 0018:ffffc900039bfc70 EFLAGS: 00010282
[ 2.293820] RAX: 0000000000000032 RBX: 0000000000000000 RCX: 0000000000000000
[ 2.293821] RDX: 0000000000000000 RSI: ffff88083ec0dc48 RDI: ffff88083ec0dc48
[ 2.293821] RBP: ffffc900039bfc70 R08: 00000000000002cf R09: 0000000000000000
[ 2.293822] R10: ffffffff81909920 R11: 0000000000000001 R12: ffffffffa0661000
[ 2.293822] R13: ffff880819bbe720 R14: ffffffffa065e550 R15: ffff88081732f420
[ 2.293823] FS: 00007fbc3a9bd0c0(0000) GS:ffff88083ec00000(0000) knlGS:0000000000000000
[ 2.293824] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.293825] CR2: 0000558b7dc3d000 CR3: 0000000817784000 CR4: 00000000003406f0
[ 2.293825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 2.293826] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 2.293826] Stack:
[ 2.293827] ffffc900039bfc80
[ 2.293827] ffffffffa066100e ffffc900039bfcf8 ffffffff81002190
[ 2.293828] ffff880817a9cc80
[ 2.293829] ffff880817a9cc80 ffffffff811c3dd1 ffffffffa065e550
[ 2.293830] ffff88081732f420
[ 2.293830] ffffc900039bfce0 ffffffff811e332b 0000000000000018
[ 2.293831] Call Trace:
[ 2.293834] [<ffffffffa066100e>] init_module+0xe/0x21 [amdkcl]
[ 2.293836] [<ffffffff81002190>] do_one_initcall+0x50/0x170
[ 2.293838] [<ffffffff811c3dd1>] ? __vunmap+0x81/0xd0
[ 2.293839] [<ffffffff811e332b>] ? kfree+0x14b/0x160
[ 2.293841] [<ffffffff81178683>] do_init_module+0x5f/0x1ec
[ 2.293843] [<ffffffff81107d17>] load_module+0x2507/0x28f0
[ 2.293844] [<ffffffff81104aa0>] ? symbol_put_addr+0x40/0x40
[ 2.293846] [<ffffffff8120856b>] ? vfs_read+0x11b/0x130
[ 2.293848] [<ffffffff811083ab>] SyS_finit_module+0xfb/0x120
[ 2.293849] [<ffffffff81003b04>] do_syscall_64+0x54/0xc0
[ 2.293850] [<ffffffff815fc96b>] entry_SYSCALL64_slow_path+0x25/0x25
[ 2.293851] Code:
[ 2.293851] d6 65 a0 48 c7 c7 a8 d2 65 a0 c6 05 a3 2c 00 00 01 e8 a7 cb b1 e0 48 c7 c0 60 b1 65 a0 e9 7f fe ff ff 80 3d 8a 2c 00 00 00 74 02 <0f> 0b 48 c7 c6 6f d6 65 a0 48 c7 c7 18 d3 65 a0 c6 05 71 2c 00
[ 2.293869] RIP
[ 2.293872] [<ffffffffa065b690>] amdkcl_drm_init+0x2c0/0x2e0 [amdkcl]
[ 2.293872] RSP <ffffc900039bfc70>
[ 2.293873] ---[ end trace 2e6cec99c0353bb3 ]---
I believe I had the same problem on NixOS. I needed to build a kernel with KALLSYMS_ALL
enabled. Could you check your /proc/config.gz
for that config?
[root@moria ~]# zcat /proc/config.gz | grep KALL
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
@svenstaro damn, I'm pretty sure you'll need to build a kernel with that enabled. If that's the problem, we'll need to add that to the prerequisites. It's because the dkms module depends on that one private symbol...
Ok, recompiling linux-lts with that flag enabled.
Alright, that worked, thanks. I can now insert the module without problems. However, I can't seem to get OpenCL or Vulkan to work (can't test OpenGL right now). Relevant logs attached. clinfo.log dmesg.log pacman.log vulkaninfo.log
Ok, running sudo clinfo
works for some reason.
clpeak is also a good test - it flat out doesn't work on the Mesa "Clover" OpenCL implementation.
So anyway, this seems to work now except for the inconvenience that vulkan and cl applications have to be run with root.
That sounds like a permissions or groups problem - clinfo and clpeak and all my OpenCL code runs as non-root. I have the open source Vulkan though.
@mirh That's what I have running on my AMD card (Bonaire - "Sea Islands"). I haven't been able to get the ROCm stuff to work though. I can build hcc but I can't get the runtime going.
@ vulkan & opencl problems: Does it help if you add yourself to the video group?
Also in the vulkan log it complains about a missing "api_version" in the icd json. Does it help if you add the option? You can use /usr/share/vulkan/icd.d/radeon_icd.x86_64.json for an example...
Also: Is X working?
Adding myself to video did the trick. I will test the vulkan version thing and X later. However, I think this driver already works much better than the previous ones and it's probably sensible to merge the current state.
Ok, adding api_version
got rid of the warning.
PSA: if you are trying this with GCN 1.0 or 1.1 gpus, make sure you have a kernel with right options and parameters
X works as well. I think this is good to go.
When will this be in the repo? I'll test it on my Bonaire!!
I'm not against releasing this, but we should make sure the requirements are really clear.
Did I miss anything? Should we make an AUR package for the required kernel?
@corngood Definitely make a kernel package - it's pretty much useless without a compatible kernel. It'll cause unbootable machines and people will likely have to hard-reset and risk data loss.
I'll make a kernel package tonight. linux-allsyms-lts or something like that. Should be easy since only that one config needs to be changed. Should amdgpu-pro-dkms depend on that specific kernel and enforce it that way? Or do we just tell people?
@brainpower report a feature request against the official packages as it would be vastly more convenient to have that in there. Link this issue there. Make sure it gets assigned to tpowa, heftig and foutrelis.
I'll make a kernel package tonight. linux-allsyms-lts or something like that.
Make also sure there are CONFIG_DRM_AMDGPU_SI=Y and CONFIG_DRM_AMDGPU_CIK=Y too.
@mirh You also need to blacklist the radeon
module
Yeah, but that's not something to do at compile time.
@mirh I'm doing the AMDGPU_CIK=Y
at boot time and the amdgpu
module is in the initrd image, not compiled into the kernel.
/etc/mkinitcpio.conf
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run. Advanced users may wish to specify all system modules
# in this array. For instance:
# MODULES="piix ide_disk reiserfs"
MODULES="amdgpu"
/etc/default/grub:
GRUB_CMDLINE_LINUX="resume=/dev/sda1 video=1360x768 amdgpu.cik_support=1 radeon.cik_support=0"
Even though the radeon
module is blacklisted it turns out I needed the radeon.cik_support=0
. This is all with the stock kernel - linux-4.14.3-1
.
CONFIG_DRM_AMDGPU_SI=Y and CONFIG_DRM_AMDGPU_CIK=Y are (already) set in the config of the linux-lts package, probably linux too. I'll be basing the package on linux-lts, so they'll be set.
@svenstaro A kernel package will probably have to be made anyway as soon as linux-lts moves on to 4.14 kernel, which should happen soonish. Probably shortly after 4.15 is released. If that wasn't the case, getting it set in the official package would be the way I'd try to go.
Here's the kernel package: https://aur.archlinux.org/packages/linux-lts49-kallsyms/
Let me know if there are any problems with it.
@brainpower dkms compiles with your patched kernel on my machine
And I also got OpenCL and OpenGL working. I had to link /opt/amdgpu-pro/lib/x86_64-linux-gnu/libGL.so to /usr/lib/libGL.so (don't know if it's intended).
I also had to remove lib32-amdgpu-pro-gst-omx from the PKGBUILD as it created an error with amdgpu-pro-gst-omx when installing packages using pacman. The error was:
/etc/xdg/gstomx.conf exists in both 'amdgpu-pro-gst-omx' and 'lib32-amdgpu-pro-gst-omx'
I also had to change lib32-binfmt-support to binfmt-support in package_lib32-amdgpu-pro as up to my knowledge, the 32bits version of binfmt does not exists (i've been looking both on aur and debian repos maybe it's elsewhere).
I should also mention that I have the latest (from archlinux repo) xorg-server (1.19.5-1)
Hope to be usefull :)
@cgurps I'm surprised you had to link libGL. I thought the libgl package would install itself in ld.so.conf.d and/or provide a link in /usr/lib/libGL.so.
@corngood yeah that was pretty weird. I installed and removed several driver for my card (through pacman) for testing, and maybe some configuration got lost somewhere and for some reason libgl thought he was already linked ...
Getting back to you, I made the exact same installation on another machine and I can't make it work.
More percisely, my xorg-server loads perfectly, but glxinfo returns:
name of display: :1
Error: couldn't find RGB GLX visual or fbconfig
i noticed done error in the xorg log which is:
[ 3478.916] (EE) AIGLX: reverting to software rendering
[ 3478.938] (EE) AIGLX error: amdgpu does not export required DRI extension
[ 3478.939] (EE) GLX: could not load software renderer
and of course I cannot compile any applications using OpenGL (as GLX is not properly started).
If you have some idea.
PS: if you want the full xorg log, tell me
https://support.amd.com/en-us/kb-articles/Pages/Radeon-Software-for-Linux-Release-Notes.aspx 16.50 is out in the meantime. And.. Idk, it seems to have changed a lot of stuff.
Yeah, bundled mesa suggests there will be some work needed to figure out how to not break stuff. But let's focuns on geting a working 17.40 first, then look at 17.50.
I created an issue for 17.50. Let's keep all 17.50 stuff there: https://github.com/corngood/archlinux-amdgpu/issues/55
This PR has the changes from #52 with the changes mentioned by @heavysink added and then updated to 17.40 . makepkg runs fine, but I haven't installed or tested the packages otherwise (yet) except for the dkms package.
I didn't really touch the dkms package, but since 17.40 it seems kernel 4.9.x is supported, since the dkms build
dkms install amdgpu-17.40/492261 -k 4.9.60-1-lts
did run successfully on my machine.So you can use linux-lts from [core] now for amdgpu-pro, if you need the dkms module.Needs custom kernel, see: https://github.com/corngood/archlinux-amdgpu/pull/54#issuecomment-350299871 The build with 4.13.11-1-ARCH still failed though.X 1.19.x should be supported now, since is apparently was since 17.30: https://github.com/corngood/archlinux-amdgpu/issues/51#issuecomment-336732087 But keep in mind, that you'll probably need mesa-noglvnd or mesa-noglvnd-nogbm as mentioned here: https://github.com/corngood/archlinux-amdgpu/pull/52#issuecomment-336610863 And I left out the 20-amdgpu.conf intentionally, to see if this screen problem persists with 17.40 or if it was fixed.
Please test and tell me if something does not work, I'll try to fix it.
Things to test:
video
group