KhronosGroup / Vulkan-Loader

Vulkan Loader
https://vulkan.lunarg.com/doc/sdk/latest/linux/LoaderInterfaceArchitecture.html
Other
508 stars 277 forks source link

BLOCKER: Wrong ELF class (Gentoo, Argent, Debian, Arch) #108

Closed Kreyren closed 5 years ago

Kreyren commented 5 years ago

DISCLAIMER: ELF class was confirmed to be sane to ignore, but issue is still present. Title will be updated based on results.

I have a problem with vulkan-loader (tried 1.1.82.0, 1.1.92.1 and 9999) on 4.18.5-argent using HD 7870 GHZ EDITION GPU which supports vulkan.

More info here: https://bugs.gentoo.org/667686 Ignore the title gentoo devs seem confused about it

Output of vulkaninfo:

==========
VULKANINFO
==========

Vulkan Instance Version: 1.1.82

ERROR: [Loader Message] Code 0 : /usr/lib32/libvulkan_intel.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib32/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
/var/tmp/portage/dev-util/vulkan-tools-1.1.82.0/work/Vulkan-Tools-2cfddd146d666efe0ed06ef1d2bc5565821df144/vulkaninfo/vulkaninfo.c:3339: failed with VK_ERROR_INITIALIZATION_FAILED

Output of strace vulkaninfo: https://paste.pound-python.org/show/T1tlyPRHjR4MLngiPzfN/

This GPU had working vulkan on other distros, but also with issues: https://askubuntu.com/questions/1041394/vk-error-incompatible-driver-error-with-vulkan-on-ati-sapphire-7870-running-xu/1042801#1042801

Output of VK_LOADER_DEBUG=all vulkaninfo: https://paste.pound-python.org/show/iLAKU3ZUxmwnhIRmLdwL/

TRIED SOLUTIONS:

REFERENCE

RarogCmex commented 5 years ago

Does it relevant with https://bugs.gentoo.org/667686 ?

Kreyren commented 5 years ago

Does it relevant with https://bugs.gentoo.org/667686 ?

https://bugs.gentoo.org/show_bug.cgi?id=667686 is relevant, but gentoo devs rename the title blaming udev which is not correct.

StefanCristian commented 5 years ago

@RarogCmex I support @Kreyren 's claims on this. It's not eudev / udev related, Gentoo folks are confusing the issue.

Does vulkan rely on any of udev standard rules to compile?

It would seem extremely silly to do that, since compilations are generally to be run in a sandbox'ed instance / chroot'ed instance while creating packages, and so forth.

Imho, from what I read, it's about a linking problem towards mesa libs, be it multilib or x86_64 only

lenny-lunarg commented 5 years ago

The first thing here is that the 32-bit drivers are not breaking anything. These errors happen when a system installs both the 32-bit and 64-bit drivers. The loader will report an error on the drivers that don't match the loader's architecture. In your case, the log you provided clearly shows that while you are failing to load two 32-bit drivers, you are also succeeding in loading two 64-bit drivers (/usr/lib64/libvulkan_intel.so and /usr/lib64/libvulkan_radeon.so).

The problem is actually stated at the end of the log: ERROR: setupLoaderTermPhysDevs: Failed to detect any valid GPUs in the current config. It looks like the problem is that neither of your working drivers is reporting any GPUs. This suggests that your drivers aren't able to find your GPU for whatever reason.

My feeling is that your driver probably doesn't support your graphics card. Your card is definitely an older card (by Vulkan standards), but AMD clearly states that card supports Vulkan. That being said, you're using the radv driver, which is the standard on Linux, but is not the AMD driver. Just because AMD supports Vulkan on that card, doesn't mean radv (Mesa) does, too. I've been unable to find any documentation on what hardware radv requires so I can't say with certainty that this is the case. I did notice, however, that their submission on the Vulkan conformance page only lists a couple of newer GPUs. This is in contrast to the AMD submissions, which explicitly list your card. So I think your driver can't run Vulkan on your GPU (if you find documentation confirming or refuting this, please let me know because I couldn't find anything definite).

What really complicates matters is that there are three different Linux Vulkan drivers for AMD cards. The amdvlk driver (which you are not using) explicitly states it supports the HD 7000 series. The recent amdgpu-pro driver also seems to support your card (though earlier versions didn't). But you're not using either of those drivers — you're using the radv (Mesa) drvier. If you want to try to get this working, the best suggestion I can offer is to try using amdvlk or amdgpu-pro. I don't know what the best way to install those would be, but at least they claim to support your card.

StefanCristian commented 5 years ago

The first thing here is that the 32-bit drivers are not breaking anything. These errors happen when a system installs both the 32-bit and 64-bit drivers. The loader will report an error on the drivers that don't match the loader's architecture. In your case, the log you provided clearly shows that while you are failing to load two 32-bit drivers, you are also succeeding in loading two 64-bit drivers (/usr/lib64/libvulkan_intel.so and /usr/lib64/libvulkan_radeon.so).

We're aware of that, and I reported it in the Gentoo bug report, both as multilib compile, and as x86_64 only. Different failure phases, same basic error, just x64 this time

lenny-lunarg commented 5 years ago

We're aware of that, and I reported it in the Gentoo bug report, both as multilib compile, and as x86_64 only. Different failure phases, same basic error, just x64 this time

Fair enough, but given the title of the issue I felt it needed to be said explicitly. The title suggests the ELF class stuff is blocking. It isn't — the inability to find a GPU is.

StefanCristian commented 5 years ago

I didn't read vulkan implementation enough, but never would've guessed it's environment-dependant Imho, I think you should be able to have a buildserver with no GPU being able to compile those packages and deliver them to the repository

How would that turn out if we wouldn't have the possbility to compile a vulkan framework on a GPUless server? It's a bit sad.

lenny-lunarg commented 5 years ago

Where do you get the impression you can't compile anything relating to Vulkan without a GPU? Compilation will work just fine. It's only when running Vulkan that you'll have trouble without a GPU.

StefanCristian commented 5 years ago

The title suggests the ELF class stuff is blocking. It isn't — the inability to find a GPU is.

This got me that impression. But we are talking about compiling vulkan-layers right now, and it's failing to compile and link against basic mesa drivers. Maybe you wanted to suggest that it is compiling against the wrong mesa libs?

lenny-lunarg commented 5 years ago

it's failing to compile and link against basic mesa drivers

You don't compile anything in Vulkan (loader, layers, or applications) against any drivers. The drivers are loaded dynamically at runtime using dlopen (on Linux), so the notion of compiling against drivers doesn't apply here. The details of how the loader find drivers is located here. The error reported by the OP is clearly a runtime issue, rather than a compile-time issue, as evidenced by the fact that he can run vulkaninfo and get a log from it.

In any case, I'd like to focus on the original issue, which seems to be that the driver is not able to find any GPUs at runtime. A co-worker pointed me to the wikipedia article listing driver support for AMD cards. This lists radv Vulkan support on the HD 7000 series as experimental. As a result, I suspect that support for that card is either disabled in a default build or is buggy. Either way, I think this means that trying a different driver that is known to have proper support for the HD 7000 series is a good idea.

StefanCristian commented 5 years ago

You don't compile anything in Vulkan (loader, layers, or applications) against any drivers. The drivers are loaded dynamically at runtime using dlopen (on Linux), so the notion of compiling against drivers doesn't apply here.

Thought so; it that makes sense.

The error reported by the OP is clearly a runtime issue, rather than a compile-time issue, as evidenced by the fact that he can run vulkaninfo and get a log from it. In any case, I'd like to focus on the original issue, which seems to be that the driver is not able to find any GPUs at runtime

I apologize, this is my fault; I checked @Kreyren 's report. We need to check some details on compile-time. We're both having compile time issues for vulkan-layers and so forth.

Cheers

lenny-lunarg commented 5 years ago

No problem, I just didn't want this issue to get too sidetracked. If you do have separate issues, feel free to report them, but you should create a new issue for that.

Kreyren commented 5 years ago

Thanks for provided info i believe that ATI Radeon driver never worked for me on this setup and AMDGPU "sometimes worked fine".

AMDGPU-PRO+AMDVKL is not available on Argent (Gentoo) so i try to port it from ubuntu.

Making custom kernel atm.

I will try them again just to eliminate the variables and AMDGPU-PRO+AMDVKL.

Original article will be updated with results.

Kreyren commented 5 years ago

UPDATE: Vulkan is working on ubuntu 18.10 running original kernel 4.18.0-11-generic with mesa-vulkan-drivers package and changes to blacklist radeon in /etc/default/grub. Affected Gentoo system is confirmed to run on amdgpu with vulkan-loader as provided by gentoo and informations from wiki + kernel configuration on AMDGPU with radeon NOT included.

@lenny-lunarg Can you recommend a course of action to verify if it's Gentoo issue (e.i: Wrong ebuild implementation) ? Sending raw of mensioned ebuild in case it's relevant https://gitweb.gentoo.org/repo/gentoo.git/tree/media-libs/vulkan-loader/vulkan-loader-1.1.92.1.ebuild

EDIT: Will try to compile it manually for verification, no need to check gentoo.

lenny-lunarg commented 5 years ago

I'm not sure how you would verify that this is a Gentoo issue. It sounds to me like the packages would need some way to detect that the system in question doesn't support the radeon driver, and that the amdgpu driver is needed instead. I don't know how that would work as I have no packaging experience.

The best way I can see to verify this is a Gentoo issue would be to get an official statement on which cards are supported by the radeon driver. If they agree that it doesn't support the HD 7000 series, then I'd say this is a clear Gentoo bug if Gentoo is installing unsupported drivers.

Does that help at all? I don't really know how much else I can offer.

Kreyren commented 5 years ago

@lenny-lunarg It might, thanks for info.

lenny-lunarg commented 5 years ago

Just one note. I'm going to leave this issue open for now, since we aren't absolutely positive that this is a Gentoo issue, but if I don't see any activity here within the next week or so I'll go ahead and close it under the assumption that is was a Gentoo issue. If you do find something wrong with the loader, please comment before then.

Kreyren commented 5 years ago

@lenny-lunarg Noted, agreed.

Trying to prove that gentoo issue now/make a patch if anyone knows more info that might help diagnose this issue then please share it, thanks.

lenny-lunarg commented 5 years ago

Closing as it's been far more than a week since I said I'd close it.

akien-mga commented 5 years ago

The first thing here is that the 32-bit drivers are not breaking anything. These errors happen when a system installs both the 32-bit and 64-bit drivers. The loader will report an error on the drivers that don't match the loader's architecture. In your case, the log you provided clearly shows that while you are failing to load two 32-bit drivers, you are also succeeding in loading two 64-bit drivers (/usr/lib64/libvulkan_intel.so and /usr/lib64/libvulkan_radeon.so).

Is there a bug report tracking this specific issue?

I landed here looking for an existing issue about this, which I also experience when having both 32-bit and 64-bit Mesa Vulkan drivers installed, which is a requirement of the Lutris game launcher (to run 32-bit and 64-bit Wine games via DXVK).

ERROR: [Loader Message] Code 0 : /usr/lib/libvulkan_intel.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
INTEL-MESA: warning: ../src/intel/vulkan/anv_device.c:1242: FINISHME: Implement pop-free point clipping
==========
VULKANINFO
==========

Vulkan Instance Version: 1.1.101

I understand these errors are not blocking, but they still look bad. IMO the loader should first check for the host arch and not report errors about the other ones.

If there is no existing issue tracking it, I could open a new one.

akien-mga commented 5 years ago

If there is no existing issue tracking it, I could open a new one.

Should I open a new issue? Having information-level messages reported as errors is quite annoying, and it's likely going to lead to loads of false positive bug reports from users of @godotengine's upcoming Vulkan backend:

$ ~/Projects/godot/godot.git/bin/godot.x11.tools.64.vulkan 
Godot Engine v3.2.dev.custom_build.0309b2d95 - https://godotengine.org
ERROR: _debug_messenger_callback: ERROR : GENERAL - Message Id Number: 0 | Message Id Name: Loader Message
        /usr/lib/libvulkan_intel.so: wrong ELF class: ELFCLASS32

        Objects - 1
                Object[0] - VK_OBJECT_TYPE_INSTANCE, Handle 0x7373950

   At: drivers/vulkan/vulkan_context.cpp:85.
ERROR: _debug_messenger_callback: ERROR : GENERAL - Message Id Number: 0 | Message Id Name: Loader Message
        /usr/lib/libvulkan_radeon.so: wrong ELF class: ELFCLASS32

        Objects - 1
                Object[0] - VK_OBJECT_TYPE_INSTANCE, Handle 0x7373950

   At: drivers/vulkan/vulkan_context.cpp:85.
$ ll /usr/share/vulkan/icd.d/
total 16
-rw-r--r-- 1 root root 146 Jun 15 19:54 intel_icd.i686.json
-rw-r--r-- 1 root root 148 Jul  3 20:56 intel_icd.x86_64.json
-rw-r--r-- 1 root root 146 Jun 15 19:52 radeon_icd.i686.json
-rw-r--r-- 1 root root 149 Jul  3 20:56 radeon_icd.x86_64.json
ghaneshmouthouvel commented 3 years ago

ERROR: [Loader Message] Code 0 : /usr/lib32/libvulkan_intel.so: wrong ELF class: ELFCLASS32 INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete

don,t know why the error is occuring

akien-mga commented 3 years ago

@ghaneshmouthouvel See #262. It's something wrongly reported as an error when it should be info/debug level, you should not worry about it (as long as you do have 64-bit libraries available too).