Closed konstmonst closed 8 years ago
Could you tell me more about the fence problem? Is the vulkan driver broken with stock linux-4.7? Is it gpu dependant?
I usually test the driver with the tri/cube demos from the vulkan sdk, but I guess that might not be sufficient (I don't think either uses fences). I just tried the smoketest from the sdk, and there's definitely something wrong with it. I'm still investigating that...
Ok, so:
demos git:(97e3b67) ./smoketest -v
Swapchain: vkCreateSwapchainKHR() called with a non-supported pCreateInfo->imageArrayLayers (i.e. 1). Minimum value is 1, maximum value is 0.
Swapchain: vkCreateSwapchainKHR() called with a non-supported pCreateInfo->imageArrayLayers (i.e. 1). Minimum value is 1, maximum value is 0.
ParameterValidation: vkWaitForFences: returned VK_ERROR_INITIALIZATION_FAILED, indicating that initialization of an object has failed
terminate called after throwing an instance of 'std::runtime_error'
what(): VkResult -3 returned
[1] 6475 abort (core dumped) ./smoketest -v
(The imageArrayLayers error is a bug in the driver, and has been acknowledged by AMD.)
I assume the vkWaitForFences
error is what you were talking about? I was really hoping to sidestep the dkms driver with linux-4.7, but I guess it's not all upstream yet.
Try to compile and run vkQualke(https://github.com/Novum/vkQuake) for example. Also try amdgpu_test from libdrm-amdgpu-pro-tools. Here is another bug about this problem when using open source kernel driver: https://github.com/ValveSoftware/Dota-2-Vulkan/issues/154
Ok, thanks for that. Sounds like the same problem I was seeing. I guess we'll have to get amdgpu-pro-dkms working.
If you can do a PR, let me know, otherwise I'll start with what you posted above.
There's also amdgpu-pro-firmware to worry about. I just checked, and the conflicting firmware binaries don't match what's in linux-firmware. Did you test with amdgpu-pro-firmware or just linux-firmware?
I guess I'll take a look at other firmware packages for ideas.
Just use my notes, I am not sure how to integrate the rest of the info in PKGBUILD file. Maybe some of the patches need to be Arch specific. Now I have another problem: the compiled amdgpu.ko freezes my system when I try to start xorg. I am using rx480, not sure if the problem would occur with other cards.
Did you install amdgpu-pro-firmware?
I had a look at the firmware loading docs: https://github.com/torvalds/linux/blob/master/Documentation/firmware_class/README
We should be able to install the firmwares in /lib/firmware/updates to avoid collisions with linux-firmware. There's also the firmware_class path, but you can only set one of those, so it's probably not a good idea.
Yep, tried with amdgpu-pro-firmware, also with stock linux-firmware package as well as with linux-firmware-git for AUR.
Also, after installing amdgpu-pro-dkms, dkms builds amdgpu.ko, not amdgpu.ko.gz, but the old amdgpu.ko.gz is loaded by default. I had to manually delete it.
So are you still getting hangs on Xorg startup? If so, is it the whole system or is Xorg just failing to start? Anything interesting in logs?
I did the changes needed to put the firmwares in /updates/ and get the dkms module building in #14.
Now I'm getting a kernel crash when starting X:
Aug 22 00:04:07 office-arch sddm[911]: Initializing...
Aug 22 00:04:07 office-arch sddm[911]: Starting...
Aug 22 00:04:07 office-arch sddm[911]: Adding new display on vt 1 ...
Aug 22 00:04:07 office-arch sddm[911]: Display server starting...
Aug 22 00:04:07 office-arch sddm[911]: Running: /usr/bin/X -nolisten tcp -auth /var/run/sddm/{2de6b707-a65c-490d-9df3-02fbdc7cabf9} -background none -noreset -displayfd 18 vt1
Aug 22 00:04:07 office-arch kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000250
Aug 22 00:04:07 office-arch kernel: IP: [<ffffffff815daee8>] __ww_mutex_lock+0x18/0x90
Aug 22 00:04:07 office-arch kernel: PGD 428c5f067 PUD 428c5e067 PMD 0
Aug 22 00:04:07 office-arch kernel: Oops: 0002 [#1] PREEMPT SMP
Aug 22 00:04:07 office-arch kernel: Modules linked in: arc4 md4 hmac nls_utf8 cifs dns_resolver fscache kvm_amd amdkfd amd_iommu_v2 nls_iso8859_1 kvm irqbypass crct10dif_pclmul amdgpu(O) crc32_pclmul ghash_clmulni_intel nls_cp437 aesni_intel aes_x86_64 ttm lrw drm_kms_helper gf128mul glue_helper xpad drm vfat fat psmouse serio_raw ablk_helper ff_memless cryptd syscopyarea r8169 sp5100_tco sysfillrect mii fam15h_power input_leds nuvoton_cir sysimgblt pcspkr fjes snd_hda_codec_realtek led_class snd_hda_codec_generic snd_hda_codec_hdmi evdev mousedev joydev i2c_piix4 snd_hda_intel rc_core mac_hid fb_sys_fops i2c_algo_bit snd_hda_codec acpi_cpufreq k10temp button tpm_tis tpm snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore shpchp sch_fq_codel ip_tables x_tables btrfs xor raid6_pq hid_generic usbhid hid sd_mod ata_generic
Aug 22 00:04:07 office-arch kernel: pata_acpi ohci_pci atkbd libps2 crc32c_intel ahci xhci_pci ohci_hcd ehci_pci pata_atiixp libahci xhci_hcd ehci_hcd libata usbcore scsi_mod usb_common i8042 serio
Aug 22 00:04:07 office-arch kernel: CPU: 2 PID: 919 Comm: Xorg.wrap Tainted: G O 4.7.0-1-cik #1
Aug 22 00:04:07 office-arch kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./970 Extreme3 R2.0, BIOS P2.20 08/05/2015
Aug 22 00:04:07 office-arch kernel: task: ffff88041f8ad580 ti: ffff880422028000 task.ti: ffff880422028000
Aug 22 00:04:07 office-arch kernel: RIP: 0010:[<ffffffff815daee8>] [<ffffffff815daee8>] __ww_mutex_lock+0x18/0x90
Aug 22 00:04:07 office-arch kernel: RSP: 0018:ffff88042202bc10 EFLAGS: 00010246
Aug 22 00:04:07 office-arch kernel: RAX: 00000000ffffffff RBX: 0000000000000250 RCX: ffff8804246fb2c0
Aug 22 00:04:07 office-arch kernel: RDX: 0000000000000000 RSI: ffff88042b1efb00 RDI: 0000000000000250
Aug 22 00:04:07 office-arch kernel: RBP: ffff88042202bc30 R08: 0000000000000438 R09: 0000000000000898
Aug 22 00:04:07 office-arch kernel: R10: 0000000000000780 R11: 0000000000000898 R12: 0000000000000250
Aug 22 00:04:07 office-arch kernel: R13: ffff8800bd9ab000 R14: ffff8804220303c0 R15: ffff88042b4f8000
Aug 22 00:04:07 office-arch kernel: FS: 00007fde6d569440(0000) GS:ffff88043ec80000(0000) knlGS:0000000000000000
Aug 22 00:04:07 office-arch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 22 00:04:07 office-arch kernel: CR2: 0000000000000250 CR3: 000000042a676000 CR4: 00000000000406e0
Aug 22 00:04:07 office-arch kernel: Stack:
Aug 22 00:04:07 office-arch kernel: ffff88042b1efb00 0000000000000250 ffff8800bd9ab000 ffff8804220303c0
Aug 22 00:04:07 office-arch kernel: ffff88042202bc60 ffffffffa048a7c5 ffffffffa048a7c5 0000000000000000
Aug 22 00:04:07 office-arch kernel: ffff8804220303c0 ffff8800bd9ab000 ffff88042202bc98 ffffffffa048b614
Aug 22 00:04:07 office-arch kernel: Call Trace:
Aug 22 00:04:07 office-arch kernel: [<ffffffffa048a7c5>] drm_modeset_lock+0x35/0xe0 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa048a7c5>] ? drm_modeset_lock+0x35/0xe0 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa048b614>] drm_atomic_get_connector_state+0x34/0x1c0 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa0526dc0>] __drm_atomic_helper_set_config+0x2a0/0x360 [drm_kms_helper]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa0526ef1>] drm_atomic_helper_set_config+0x71/0xb0 [drm_kms_helper]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa0479e85>] drm_mode_set_config_internal+0x65/0x110 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa0527e90>] restore_fbdev_mode+0xb0/0x260 [drm_kms_helper]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa052a6b4>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa06b6e9a>] amdgpu_fbdev_restore_mode+0x1a/0x40 [amdgpu]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa06a4ca2>] amdgpu_driver_lastclose_kms+0x12/0x20 [amdgpu]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa046efae>] drm_lastclose+0x2e/0x120 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffffa046f39a>] drm_release+0x2fa/0x4d0 [drm]
Aug 22 00:04:07 office-arch kernel: [<ffffffff811fc13f>] __fput+0x9f/0x1e0
Aug 22 00:04:07 office-arch kernel: [<ffffffff811fc2be>] ____fput+0xe/0x10
Aug 22 00:04:07 office-arch kernel: [<ffffffff810977f3>] task_work_run+0x83/0xb0
Aug 22 00:04:07 office-arch kernel: [<ffffffff8100366a>] exit_to_usermode_loop+0xba/0xc0
Aug 22 00:04:07 office-arch kernel: [<ffffffff81003b9e>] syscall_return_slowpath+0x4e/0x60
Aug 22 00:04:07 office-arch kernel: [<ffffffff815dd7ba>] entry_SYSCALL_64_fastpath+0xa2/0xa4
Aug 22 00:04:07 office-arch kernel: Code: ff e8 fd f1 a9 ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 b8 ff ff ff ff 48 89 e5 41 56 41 55 41 54 53 48 89 fb <f0> 0f c1 07 83 e8 01 78 29 83 46 10 01 48 89 77 28 0f ae f0 8b
Aug 22 00:04:07 office-arch kernel: RIP [<ffffffff815daee8>] __ww_mutex_lock+0x18/0x90
Aug 22 00:04:07 office-arch kernel: RSP <ffff88042202bc10>
Aug 22 00:04:07 office-arch kernel: CR2: 0000000000000250
Aug 22 00:04:07 office-arch kernel: ---[ end trace ebe28006d3e69582 ]---
Does this look like yours?
I'm done for the day, but my next steps will be to see if the gentoo guys (or anyone else) has seen this, and if they've actually got it working on 4.7. Then I'll try to debug it directly.
Couldn't get the logs, I am still working on fixing the journald configuration. I didn't get any output when xorg crashed, it just stopped reacting to keyboard. After I loaded amdgpu-pro amdgpu.ko, I got a crash dump while trying to run amdgpu_test. I think the output was different, I will try it again later today when I return from work.
Good news. I have a working Xorg after setting amdgpu.dal=0
. The vulkan smoketest also passes now.
Could you let me know if that works for you?
Have you tried to use gentoo patches to build dkms module? Vulkan doesn't work without it, so no point installing it without working dkms driver either (there are fence errors without it) Here is my version:
package_amdgpu-pro-dkms () { pkgdesc="amdgpu-pro driver in DKMS format." depends=('dkms>=1.95') arch=('any')
}
I had to edit /var/lib/dkms/amdgpu-pro-16.30.3/315407/source/amd/dal/Makefile and remove the line with -Werror and patch /var/lib/dkms/amdgpu-pro-16.30.3/315407/source/amd/amdgpu/amdgpu_atpx_handler.c and add 0 as a second parameter to vga_switcheroo_register_handler(). I also had to add the link System.map to System.map-4.7.1-1-ARCH. After that the installation was successfull. I will post after I check if it works.