felixdoerre / primus_vk

Vulkan GPU-offloading layer
BSD 2-Clause "Simplified" License
230 stars 17 forks source link

Installation failed on Ubuntu 20.04 #83

Closed adamryczkowski closed 4 years ago

adamryczkowski commented 4 years ago

After following (the best I could) steps 1-3 I got the following output on vulkaninfo:

$ vulkaninfo
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
/build/vulkan-tools-KEbD_A/vulkan-tools-1.2.131.1+dfsg1/vulkaninfo/vulkaninfo.h:939: failed with ERROR_INITIALIZATION_FAILED

optirun vulkaninfo produces no output at all, exiting silently with error code 139.

Do you have time to tell me, what should I do to install primus_vk from this point?

felixdoerre commented 4 years ago

I am not sure, why vulkaninfo show an error. It shouldn't do that (or if it does, the error should be different). You can run VK_LOADER_DEBUG=info vulkaninfo so we can better understand what is detected and what goes wrong.

optirun vulkaninfo will probably not run at all. If you use optirun you need to enable the primus_vk-layer. There is the launch-script pvkrun which does exactly that. So could you show the output of ENABLE_PRIMUS_LAYER=1 optirun vulkaninfo?

adamryczkowski commented 4 years ago

Thank you very much for a fast response.

Although I didn't touch anything, the problem with vulkaninfo self-healed on its own. Maybe the system restart helped? Unfortunately, I am not out of woods. I am attaching the output of the VK_LOADER=info vulkaninfo anyway in https://pastebin.com/yqnHgtN6 .

optirun vulkainfo does not output anything, just exists with error code 139.

The same with ENABLE_PRIMUS_LAYER=1 optirun vulkaninfo - also no output and the same exit code 139. I also tried VK_LOADER=info ENABLE_PRIMUS_LAYER=1 optirun vulkaninfo with the same result.

adamryczkowski commented 4 years ago

Maybe I state obvious, but AFAIK optirun does work on my system. In particular,

$ cat /proc/acpi/bbswitch 
0000:01:00.0 OFF

and when I type

$ optirun nvidia-smi
Sun Oct 11 17:14:21 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.04    Driver Version: 455.23.04    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 165...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   46C    P3    15W /  N/A |      6MiB /  3914MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A    310885      G   /usr/lib/xorg/Xorg                  5MiB |
+-----------------------------------------------------------------------------+

I did not install primusrun, so your vkrun.sh would most probably fail anyway, as it is using it over optirun.

felixdoerre commented 4 years ago

Although I didn't touch anything, the problem with vulkaninfo self-healed on its own. Maybe the system restart helped?

I doubt that a system-restart was needed. However it might have stopped any other optirun-process running in parallel.

Error code 139 indicates a segmentation fault. Could you run the program in gdb and provide a backtrace?

adamryczkowski commented 4 years ago

Here you have. For gdb -q optirun --args vulkaninfo: https://pastebin.com/SggrrDhv

It seems the program ran well inside the gdb. But there is another way of calling gdb:

$ optirun gdb -q vulkaninfo
Reading symbols from vulkaninfo...
(No debugging symbols found in vulkaninfo)
(gdb) run
Starting program: /usr/bin/vulkaninfo 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a540cd in getenv () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007ffff7a540cd in getenv () from /usr/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff7e1124b in ?? () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#2  0x00007ffff7e22868 in ?? () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#3  0x00007ffff7e26910 in ?? () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#4  0x00007ffff7e2a70e in vkEnumerateInstanceExtensionProperties () from /usr/lib/x86_64-linux-gnu/libvulkan.so.1
#5  0x00005555555adce0 in ?? ()
#6  0x00005555555ae296 in ?? ()
#7  0x0000555555562a40 in ?? ()
#8  0x00007ffff7a320b3 in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6
#9  0x00005555555641ee in ?? ()

Sorry for posting it after a week - it took me a while to find a time to re-learn basics of gdb (I normally use C++ on IDEs).

To make this trace readable, perhaps I need to build some things with debugging symbols.

felixdoerre commented 4 years ago

Ooh, I believe, I known what's broken here: This patch is presumably not contained in Ubuntu 20.04: https://salsa.debian.org/nvidia-team/primus/-/commit/c1938456e0ea569f0bd0588d48adfb05ca40e801 This bug caused segfaults in getenv for me. You might be able to workaround this issue by setting __GLVND_DISALLOW_PATCHING=1 as environment variable manually outside before invoking optirun/stuff. Otherwise you will just need to wait for an update for primus so something >= 0~20150328-11, or backport the fix yourself.

adamryczkowski commented 4 years ago

I already have the __GLVND_DISALLOW_PATCHING=1 in my /etc/environment; all my code was running with that variable set to zero.

felixdoerre commented 4 years ago

Ok, so that workaround does not work. You'll probably need to compile a fixed version of primus yourself or upgrade to ubuntu 20.10. Sorry, that I have no easy answer.

adamryczkowski commented 4 years ago

Do you mean https://salsa.debian.org/nvidia-team/primus/-/tree/master?

felixdoerre commented 4 years ago

Yes, I'd clone that repository and then dpkg-buildpackage on that.

adamryczkowski commented 4 years ago

dpkg-buildpackage returns

dpkg-buildpackage 
dpkg-buildpackage: info: source package primus
dpkg-buildpackage: info: source version 0~20150328-13
dpkg-buildpackage: info: source distribution UNRELEASED
dpkg-buildpackage: info: source changed by Andreas Beckmann <anbe@debian.org>
dpkg-buildpackage: info: host architecture amd64
 dpkg-source --before-build .
dpkg-checkbuilddeps: error: Unmet build dependencies: debhelper-compat (= 13)
dpkg-buildpackage: warning: build dependencies/conflicts unsatisfied; aborting
dpkg-buildpackage: warning: (Use -d flag to override.)

I've read that instead of debhelper-compat I could install debhelper, which I did. And then:

$ dpkg-buildpackage --no-check-builddeps
dpkg-buildpackage: info: source package primus
dpkg-buildpackage: info: source version 0~20150328-13
dpkg-buildpackage: info: source distribution UNRELEASED
dpkg-buildpackage: info: source changed by Andreas Beckmann <anbe@debian.org>
dpkg-buildpackage: info: host architecture amd64
 dpkg-source --before-build .
 debian/rules clean
dh clean
   dh_auto_clean
   debian/rules execute_after_dh_auto_clean
make[1]: Entering directory '/home/adam/tmp/primus'
rm -rf lib
make[1]: Leaving directory '/home/adam/tmp/primus'
   dh_clean
 dpkg-source -b .
dpkg-source: error: can't build with source format '3.0 (quilt)': no upstream tarball found at ../primus_0~20150328.orig.tar.{bz2,gz,lzma,xz}
dpkg-buildpackage: error: dpkg-source -b . subprocess returned exit status 255

It seems building code from debian repositories is more difficult than building from repositories from GitHub.

felixdoerre commented 4 years ago

Try this flag combination: dpkg-buildpackage -b -nc -us -uc. (It's more complicated because it usually does more things :D )

adamryczkowski commented 4 years ago

Yes! I've did it. Thank you!! I've installed all the three .deb packages (primus_0~20150328-13_amd64.deb, primus-libs_0~20150328-13_amd64.deb, primus-nvidia_0~20150328-13_amd64.deb), and then

$optirun vulkaninfo
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
/build/vulkan-tools-KEbD_A/vulkan-tools-1.2.131.1+dfsg1/vulkaninfo/vulkaninfo.h:939: failed with ERROR_INITIALIZATION_FAILED
felixdoerre commented 4 years ago

Ok, I believe we are a step further :D. Now you should be able to get output from VK_LOADER_DEBUG=info optirun vulkaninfo, right?

adamryczkowski commented 4 years ago

Yes. Here is the output: https://pastebin.com/RjPfxkE5

felixdoerre commented 4 years ago

I still don't see any trace of primus-vk activating. Can you try: ENABLE_PRIMUS_LAYER=1 VK_LOADER_DEBUG=info optirun vulkaninfo? Probably the change that I thought of is not in Ubuntu 20.04, and optirun does not automatically set that environment variable.

adamryczkowski commented 4 years ago

https://pastebin.com/NgYfyFDE

adamryczkowski commented 4 years ago

After replacing libprimus_vk.so.1 with libprimus_vk.so in /usr/share/vulkan/implicit_layer.d/primus_vk.json and running ENABLE_PRIMUS_LAYER=1 VK_LOADER_DEBUG=info optirun vulkaninfo again: https://pastebin.com/9UZHi5DY https://pastebin.com/gfsbAPCw

Return status: 0.

adamryczkowski commented 4 years ago

ENABLE_PRIMUS_LAYER=1 optirun vkcube does work and I can confirm with nvidia-smi that my GPU is actually doing something.

felixdoerre commented 4 years ago

Cool! So I'd guess we can count this as "It works"?

adamryczkowski commented 4 years ago

Yes! I've just tried the 7DaysToDie and it really does work much smoother now. Thank you, thank you very much!!!