Ashark / archlinux-amdgpu-pro

Radeon Software for Linux (AMDGPU PRO) PKGBUILD generator for ArchLinux AUR
https://aur.archlinux.org/pkgbase/amdgpu-pro-installer/
119 stars 25 forks source link

Update to 17.40 #54

Closed brainpower closed 6 years ago

brainpower commented 7 years ago

This PR has the changes from #52 with the changes mentioned by @heavysink added and then updated to 17.40 . makepkg runs fine, but I haven't installed or tested the packages otherwise (yet) except for the dkms package.

I didn't really touch the dkms package, but since 17.40 it seems kernel 4.9.x is supported, since the dkms build dkms install amdgpu-17.40/492261 -k 4.9.60-1-lts did run successfully on my machine. So you can use linux-lts from [core] now for amdgpu-pro, if you need the dkms module. Needs custom kernel, see: https://github.com/corngood/archlinux-amdgpu/pull/54#issuecomment-350299871 The build with 4.13.11-1-ARCH still failed though.

X 1.19.x should be supported now, since is apparently was since 17.30: https://github.com/corngood/archlinux-amdgpu/issues/51#issuecomment-336732087 But keep in mind, that you'll probably need mesa-noglvnd or mesa-noglvnd-nogbm as mentioned here: https://github.com/corngood/archlinux-amdgpu/pull/52#issuecomment-336610863 And I left out the 20-amdgpu.conf intentionally, to see if this screen problem persists with 17.40 or if it was fixed.

Please test and tell me if something does not work, I'll try to fix it.

Things to test:

svenstaro commented 6 years ago

@brainpower how would you feel about taking over the AUR package in case @corngood doesn't return?

corngood commented 6 years ago

I'm still watching this, I'm just not able to do any testing. If you guys test it and give me the thumbs up, I'll release it on AUR. I'm also happy to hand the package over to a new maintainer, but preferably it would be someone like @brainpower who has provided working MRs.

svenstaro commented 6 years ago

This PR doesn't even install because packages like libffi-dev do not even exist in Arch.

znmeb commented 6 years ago

@svenstaro The "libffi-dev" is a Debian / Ubuntu convention for the header files for "libffi". If "libffi" exists on the Arch system the headers should be there as well; Arch packages include the header files.

I have an AMD GPU - I can test this as long as it doesn't break my OpenCL. I'll fire up an Arch virtual machine and see what happens. ;-)

svenstaro commented 6 years ago

@znmeb Yes indeed, but the problem is that this package requires Debian-ism packages and not those found in Arch.

brainpower commented 6 years ago

I pushed a commit dealing with libffi-dev and libtinfo-dev, please report if there are any other problems. I'm currently not at a machine with an AMD graphics card, so I can't really test.

About maintaining: I probably could do it, but I'd rather not, because the time I can spend on this is rather limited and I couldn't promise to react on any problems or updates in as a timely manner as I'd like to. Not being able to squeeze in some testing of the packages for several weeks proves that point.

So I'll say again, please test this PR and report any issues. Most of those are easily fixed in a few minutes, which I can squeeze in more easily than a few hours of testing.

svenstaro commented 6 years ago

@brainpower thanks for the change but libedit2 is also not available in Arch.

svenstaro commented 6 years ago

Also libpci3.

corngood commented 6 years ago

I wonder if we can update the travis test to install the packages and catch this sort of thing. @svenstaro could you paste the pacman command you're using to install?

znmeb commented 6 years ago

I have an AMD GPU - let me know when it can build and I'll test it! I want this rather badly since the "supported" AMD code for Ubuntu doesn't work with 16.04.3 LTS! http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-Compatibility-Advisory-with-Ubuntu-16.04.2-and-16.04.3.aspx

svenstaro commented 6 years ago

Using pacman -U amdgpu*.pkg.tar.xz and getting:

looking for conflicting packages...
:: amdgpu-pro-libdrm and libdrm are in conflict. Remove libdrm? [y/N] y
:: amdgpu-pro-libgl and libglvnd are in conflict (libgl). Remove libglvnd? [y/N] y
error: failed to prepare transaction (could not satisfy dependencies)
:: lib32-libglvnd: removing libglvnd breaks dependency 'libglvnd'
:: mesa: removing libglvnd breaks dependency 'libglvnd'
brainpower commented 6 years ago

@corngood : Well, adding "-i" to the makepkg call should cause the packages to be installed... But we'd need to install any required packages from AUR beforehand, because pacman will fail to install those...

@svenstaro you need mesa-noglvnd, I think, the amdgpu-pro libgl is not glvnd compatible.

znmeb commented 6 years ago

If I'm reading the AMD page correctly, it also requires X <= 1.18 and a kernel <= 4.9. Although I tried the Ubuntu package with Ubuntu 16.04.2 and it didn't work - black-screened.

svenstaro commented 6 years ago
error: failed to commit transaction (conflicting files)
/etc/amd/amdrc exists in both 'amdgpu-pro-libgl' and 'lib32-amdgpu-pro-libgl'
svenstaro commented 6 years ago

Alright, trying to run this with linux-lts:

[    2.291647] Error: fail to get symbol drm_gem_prime_dmabuf_ops
[    2.292189] ------------[ cut here ]------------
[    2.292708] kernel BUG at /var/lib/dkms/amdgpu-17.40/492261/build/amd/amdkcl/kcl_common.h:34!
[    2.293245] invalid opcode: 0000 [#1] SMP
[    2.293791] Modules linked in: amdkcl(O+) snd_hda_intel(+) drm_kms_helper snd_hda_codec evdev input_leds joydev drm snd_ctxfi(+) snd_hda_core led_class mousedev mac_hid snd_hwdep snd_pcm syscopyarea snd_timer i2c_i801(+) sysfillrect snd r8169 sysimgblt i2c_smbus fb_sys_fops i2c_algo_bit soundcore mii mei_me(+) mei shpchp fan(+) thermal wmi hci_uart btbcm btqca btintel bluetooth parport_pc(+) parport battery rfkill video i2c_hid intel_lpss_acpi intel_lpss pcc_cpufreq(-) acpi_als tpm_infineon kfifo_buf fjes acpi_pad tpm_tis tpm_tis_core industrialio button tpm sch_fq_codel ip_tables x_tables ext4 crc16 jbd2 fscrypto mbcache hid_generic usbhid hid crc32c_intel ahci libahci xhci_pci nvme xhci_hcd nvme_core libata usbcore scsi_mod usb_common i8042 serio
[    2.293813] CPU: 0 PID: 221 Comm: systemd-udevd Tainted: G           O    4.9.66-1-lts #1
[    2.293813] Hardware name: Gigabyte Technology Co., Ltd. B150M-HD3/B150M-HD3-CF, BIOS F22a 07/04/2017
[    2.293814] task: ffff8808174aac40 task.stack: ffffc900039bc000
[    2.293815] RIP: 0010:[<ffffffffa065b690>] 
[    2.293819]  [<ffffffffa065b690>] amdkcl_drm_init+0x2c0/0x2e0 [amdkcl]
[    2.293820] RSP: 0018:ffffc900039bfc70  EFLAGS: 00010282
[    2.293820] RAX: 0000000000000032 RBX: 0000000000000000 RCX: 0000000000000000
[    2.293821] RDX: 0000000000000000 RSI: ffff88083ec0dc48 RDI: ffff88083ec0dc48
[    2.293821] RBP: ffffc900039bfc70 R08: 00000000000002cf R09: 0000000000000000
[    2.293822] R10: ffffffff81909920 R11: 0000000000000001 R12: ffffffffa0661000
[    2.293822] R13: ffff880819bbe720 R14: ffffffffa065e550 R15: ffff88081732f420
[    2.293823] FS:  00007fbc3a9bd0c0(0000) GS:ffff88083ec00000(0000) knlGS:0000000000000000
[    2.293824] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.293825] CR2: 0000558b7dc3d000 CR3: 0000000817784000 CR4: 00000000003406f0
[    2.293825] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    2.293826] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[    2.293826] Stack:
[    2.293827]  ffffc900039bfc80
[    2.293827]  ffffffffa066100e ffffc900039bfcf8 ffffffff81002190
[    2.293828]  ffff880817a9cc80
[    2.293829]  ffff880817a9cc80 ffffffff811c3dd1 ffffffffa065e550
[    2.293830]  ffff88081732f420
[    2.293830]  ffffc900039bfce0 ffffffff811e332b 0000000000000018
[    2.293831] Call Trace:
[    2.293834]  [<ffffffffa066100e>] init_module+0xe/0x21 [amdkcl]
[    2.293836]  [<ffffffff81002190>] do_one_initcall+0x50/0x170
[    2.293838]  [<ffffffff811c3dd1>] ? __vunmap+0x81/0xd0
[    2.293839]  [<ffffffff811e332b>] ? kfree+0x14b/0x160
[    2.293841]  [<ffffffff81178683>] do_init_module+0x5f/0x1ec
[    2.293843]  [<ffffffff81107d17>] load_module+0x2507/0x28f0
[    2.293844]  [<ffffffff81104aa0>] ? symbol_put_addr+0x40/0x40
[    2.293846]  [<ffffffff8120856b>] ? vfs_read+0x11b/0x130
[    2.293848]  [<ffffffff811083ab>] SyS_finit_module+0xfb/0x120
[    2.293849]  [<ffffffff81003b04>] do_syscall_64+0x54/0xc0
[    2.293850]  [<ffffffff815fc96b>] entry_SYSCALL64_slow_path+0x25/0x25
[    2.293851] Code: 
[    2.293851] d6 65 a0 48 c7 c7 a8 d2 65 a0 c6 05 a3 2c 00 00 01 e8 a7 cb b1 e0 48 c7 c0 60 b1 65 a0 e9 7f fe ff ff 80 3d 8a 2c 00 00 00 74 02 <0f> 0b 48 c7 c6 6f d6 65 a0 48 c7 c7 18 d3 65 a0 c6 05 71 2c 00 
[    2.293869] RIP 
[    2.293872]  [<ffffffffa065b690>] amdkcl_drm_init+0x2c0/0x2e0 [amdkcl]
[    2.293872]  RSP <ffffc900039bfc70>
[    2.293873] ---[ end trace 2e6cec99c0353bb3 ]---
corngood commented 6 years ago

I believe I had the same problem on NixOS. I needed to build a kernel with KALLSYMS_ALL enabled. Could you check your /proc/config.gz for that config?

svenstaro commented 6 years ago
[root@moria ~]# zcat /proc/config.gz  | grep KALL
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_ABSOLUTE_PERCPU=y
CONFIG_KALLSYMS_BASE_RELATIVE=y
corngood commented 6 years ago

@svenstaro damn, I'm pretty sure you'll need to build a kernel with that enabled. If that's the problem, we'll need to add that to the prerequisites. It's because the dkms module depends on that one private symbol...

svenstaro commented 6 years ago

Ok, recompiling linux-lts with that flag enabled.

svenstaro commented 6 years ago

Alright, that worked, thanks. I can now insert the module without problems. However, I can't seem to get OpenCL or Vulkan to work (can't test OpenGL right now). Relevant logs attached. clinfo.log dmesg.log pacman.log vulkaninfo.log

svenstaro commented 6 years ago

Ok, running sudo clinfo works for some reason.

znmeb commented 6 years ago

clpeak is also a good test - it flat out doesn't work on the Mesa "Clover" OpenCL implementation.

svenstaro commented 6 years ago

So anyway, this seems to work now except for the inconvenience that vulkan and cl applications have to be run with root.

znmeb commented 6 years ago

That sounds like a permissions or groups problem - clinfo and clpeak and all my OpenCL code runs as non-root. I have the open source Vulkan though.

mirh commented 6 years ago

As for opencl, you can 'compare' with choices made here

znmeb commented 6 years ago

@mirh That's what I have running on my AMD card (Bonaire - "Sea Islands"). I haven't been able to get the ROCm stuff to work though. I can build hcc but I can't get the runtime going.

mirh commented 6 years ago

See comments. ROCm is only for newest-est cards.

brainpower commented 6 years ago

@ vulkan & opencl problems: Does it help if you add yourself to the video group?

Also in the vulkan log it complains about a missing "api_version" in the icd json. Does it help if you add the option? You can use /usr/share/vulkan/icd.d/radeon_icd.x86_64.json for an example...

Also: Is X working?

svenstaro commented 6 years ago

Adding myself to video did the trick. I will test the vulkan version thing and X later. However, I think this driver already works much better than the previous ones and it's probably sensible to merge the current state.

svenstaro commented 6 years ago

Ok, adding api_version got rid of the warning.

mirh commented 6 years ago

PSA: if you are trying this with GCN 1.0 or 1.1 gpus, make sure you have a kernel with right options and parameters

svenstaro commented 6 years ago

X works as well. I think this is good to go.

znmeb commented 6 years ago

When will this be in the repo? I'll test it on my Bonaire!!

corngood commented 6 years ago

I'm not against releasing this, but we should make sure the requirements are really clear.

Did I miss anything? Should we make an AUR package for the required kernel?

znmeb commented 6 years ago

@corngood Definitely make a kernel package - it's pretty much useless without a compatible kernel. It'll cause unbootable machines and people will likely have to hard-reset and risk data loss.

brainpower commented 6 years ago

I'll make a kernel package tonight. linux-allsyms-lts or something like that. Should be easy since only that one config needs to be changed. Should amdgpu-pro-dkms depend on that specific kernel and enforce it that way? Or do we just tell people?

svenstaro commented 6 years ago

@brainpower report a feature request against the official packages as it would be vastly more convenient to have that in there. Link this issue there. Make sure it gets assigned to tpowa, heftig and foutrelis.

mirh commented 6 years ago

I'll make a kernel package tonight. linux-allsyms-lts or something like that.

Make also sure there are CONFIG_DRM_AMDGPU_SI=Y and CONFIG_DRM_AMDGPU_CIK=Y too.

znmeb commented 6 years ago

@mirh You also need to blacklist the radeon module

mirh commented 6 years ago

Yeah, but that's not something to do at compile time.

znmeb commented 6 years ago

@mirh I'm doing the AMDGPU_CIK=Y at boot time and the amdgpu module is in the initrd image, not compiled into the kernel.

/etc/mkinitcpio.conf

# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES="piix ide_disk reiserfs"
MODULES="amdgpu"

/etc/default/grub:

GRUB_CMDLINE_LINUX="resume=/dev/sda1 video=1360x768 amdgpu.cik_support=1 radeon.cik_support=0"

Even though the radeon module is blacklisted it turns out I needed the radeon.cik_support=0. This is all with the stock kernel - linux-4.14.3-1.

brainpower commented 6 years ago

CONFIG_DRM_AMDGPU_SI=Y and CONFIG_DRM_AMDGPU_CIK=Y are (already) set in the config of the linux-lts package, probably linux too. I'll be basing the package on linux-lts, so they'll be set.

@svenstaro A kernel package will probably have to be made anyway as soon as linux-lts moves on to 4.14 kernel, which should happen soonish. Probably shortly after 4.15 is released. If that wasn't the case, getting it set in the official package would be the way I'd try to go.

brainpower commented 6 years ago

Here's the kernel package: https://aur.archlinux.org/packages/linux-lts49-kallsyms/

Let me know if there are any problems with it.

cgurps commented 6 years ago

@brainpower dkms compiles with your patched kernel on my machine

cgurps commented 6 years ago

And I also got OpenCL and OpenGL working. I had to link /opt/amdgpu-pro/lib/x86_64-linux-gnu/libGL.so to /usr/lib/libGL.so (don't know if it's intended).

I also had to remove lib32-amdgpu-pro-gst-omx from the PKGBUILD as it created an error with amdgpu-pro-gst-omx when installing packages using pacman. The error was: /etc/xdg/gstomx.conf exists in both 'amdgpu-pro-gst-omx' and 'lib32-amdgpu-pro-gst-omx' I also had to change lib32-binfmt-support to binfmt-support in package_lib32-amdgpu-pro as up to my knowledge, the 32bits version of binfmt does not exists (i've been looking both on aur and debian repos maybe it's elsewhere).

I should also mention that I have the latest (from archlinux repo) xorg-server (1.19.5-1)

Hope to be usefull :)

corngood commented 6 years ago

@cgurps I'm surprised you had to link libGL. I thought the libgl package would install itself in ld.so.conf.d and/or provide a link in /usr/lib/libGL.so.

cgurps commented 6 years ago

@corngood yeah that was pretty weird. I installed and removed several driver for my card (through pacman) for testing, and maybe some configuration got lost somewhere and for some reason libgl thought he was already linked ...

cgurps commented 6 years ago

Getting back to you, I made the exact same installation on another machine and I can't make it work.

More percisely, my xorg-server loads perfectly, but glxinfo returns:

name of display: :1
Error: couldn't find RGB GLX visual or fbconfig

i noticed done error in the xorg log which is:

 [  3478.916] (EE) AIGLX: reverting to software rendering
 [  3478.938] (EE) AIGLX error: amdgpu does not export required DRI extension
 [  3478.939] (EE) GLX: could not load software renderer

and of course I cannot compile any applications using OpenGL (as GLX is not properly started).

If you have some idea.

PS: if you want the full xorg log, tell me

mirh commented 6 years ago

https://support.amd.com/en-us/kb-articles/Pages/Radeon-Software-for-Linux-Release-Notes.aspx 16.50 is out in the meantime. And.. Idk, it seems to have changed a lot of stuff.

brainpower commented 6 years ago

Yeah, bundled mesa suggests there will be some work needed to figure out how to not break stuff. But let's focuns on geting a working 17.40 first, then look at 17.50.

I created an issue for 17.50. Let's keep all 17.50 stuff there: https://github.com/corngood/archlinux-amdgpu/issues/55