felixdoerre / primus_vk

Vulkan GPU-offloading layer
BSD 2-Clause "Simplified" License
230 stars 18 forks source link

Last steam-manjaro version with pvkrun, does not run the dedicated gpu and vulkan #77

Closed zimudec closed 3 years ago

zimudec commented 3 years ago

The latest version of steam-manjaro (1.0.0.66-1) still shows an error when trying to run vulkan:

PrimusVK: Searching for display GPU:
PrimusVK: 0x56731a60: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x56731a60.
PrimusVK: No device for the rendering GPU found. Is the correct driver installed?
PrimusVK: VK_ICD_FILENAMES not set
vkCreateInstance failed with error -3
BInit - Unable to initialize Vulkan!
[0929/203723.960952:INFO:crash_reporting.cc(270)] Crash reporting enabled for process: renderer
[0929/203724.054021:INFO:crash_reporting.cc(270)] Crash reporting enabled for process: renderer
[0929/203724.172321:INFO:crash_reporting.cc(270)] Crash reporting enabled for process: renderer
Installing breakpad exception handler for appid(steam)/version(1599174997)

I go back to the previous version (1.0.0.61-7) and pvkrun with vulkan, steam and optimus works without problems.

I do not know the cause of the problem, I need guidance.

zimudec commented 3 years ago

However there is still something that I don't understand: the mesa device select layer has to see that the extension VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2_EXTENSION_NAME is enabled (otherwise it wouldn't try to call vkGetPhysicalDeviceProperties2KHR) and the loader has to see this extension disabled (otherwise it would not try to call the null-function). By the way: what is your vulkan-loader version?

I have the following version installed, both in 64 and 32 bits: vulkan-icd-loader 1.2.151-1

I don't know if it is related, but I have the following content in the file /usr/share/vulkan/icd.d/nv_vulkan_wrapper.json:

{
    "file_format_version" : "1.0.0",
    "ICD": {
        "library_path": "libnv_vulkan_wrapper.so.1",
        "api_version" : "1.1.84"
    }
}

So next gdb, let's validate the theory that the vulkan-loader thinks the extension is disabled. The memory-addresses I see from your output seem stable, so assuming that you didn't recompile anything, I can just specify literal addresses which seems easier to me. If the addresses changed, you would need to update the address in the commands accordingly.

b fill_drm_device_info
y
r
b *0x7fff24321c5e
c
p/x $rdi
x/b 0x3f78 + $rdi

This right?:

[zimudec@zimudec Rise of the Tomb Raider]$ STEAM_RUNTIME=0 GAME_LAUNCH_PREFIX="gdb --args" pvkrun ./RiseOfTheTombRaider.sh
WARNING: Rise of the Tomb Raider launched with STEAM_RUNTIME=0
         We recommend using the steam runtime if possible
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
PrimusVK: Searching for display GPU:
PrimusVK: 0x218b240: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x218b240.
PrimusVK: 0x2171340.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider...
(No debugging symbols found in /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider)
(gdb) b fill_drm_device_info
Function "fill_drm_device_info" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (fill_drm_device_info) pending.
(gdb) r
Starting program: /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
RiseOfTheTombRaider: crash reporter initialised with path "/home/zimudec/.local/share/feral-interactive/Rise of the Tomb Raider/crashes"
[New Thread 0x7fff3039d640 (LWP 1987)]
[New Thread 0x7fff27fff640 (LWP 1988)]
[New Thread 0x7fff2f54e640 (LWP 1989)]
[New Thread 0x7fff2ed4d640 (LWP 1990)]
[New Thread 0x7fff2e54c640 (LWP 1991)]
[New Thread 0x7fff2c893640 (LWP 2008)]
[New Thread 0x7fff277fe640 (LWP 2009)]
[New Thread 0x7fff26ffd640 (LWP 2010)]
[New Thread 0x7fff267fc640 (LWP 2011)]
[New Thread 0x7fff2573a640 (LWP 2012)]
SDL2 initialised [built against 2.0.7, running with 2.0.7]
[Detaching after fork from child process 2013]
[New Thread 0x7fff2506f640 (LWP 2014)]
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
[S_API FAIL] SteamAPI_Init() failed; unable to locate a running instance of Steam, or a local steamclient.so.
[Detaching after fork from child process 2015]
[Thread 0x7fff2506f640 (LWP 2014) exited]
Setting breakpad minidump AppID = 391220
Steam_SetMinidumpSteamID:  Caching Steam ID:  76561198008468660 [API loaded no]
[New Thread 0x7fff2486e640 (LWP 2212)]
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[New Thread 0x7fff2506f640 (LWP 2213)]
[New Thread 0x7ffee7fff640 (LWP 2214)]
[New Thread 0x7ffee77fe640 (LWP 2217)]
[New Thread 0x7ffee6ffd640 (LWP 2218)]
[Thread 0x7ffee7fff640 (LWP 2214) exited]
[New Thread 0x7fff24477640 (LWP 2219)]
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
[New Thread 0x7ffee7fff640 (LWP 2227)]
[New Thread 0x7ffee4f9e640 (LWP 2228)]
[New Thread 0x7ffecbfff640 (LWP 2229)]
[New Thread 0x7ffecb7fe640 (LWP 2230)]
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "RiseOfTheTombRa" hit Breakpoint 1, fill_drm_device_info (
    info=info@entry=0x7536050, drm_device=drm_device@entry=0x750f040, 
    device=0x737eab0)
    at ../mesa-20.1.7/src/vulkan/device-select-layer/device_select_layer.c:237
237 ../mesa-20.1.7/src/vulkan/device-select-layer/device_select_layer.c: No existe el fichero o el directorio.
(gdb) x/50i info->GetPhysicalDeviceProperties2KHR
   0x7fff2432fc30:  push   %r12
   0x7fff2432fc32:  push   %rbp
   0x7fff2432fc33:  mov    %rsi,%rbp
   0x7fff2432fc36:  push   %rbx
   0x7fff2432fc37:  mov    0x8(%rdi),%r12
   0x7fff2432fc3b:  mov    %rdi,%rbx
   0x7fff2432fc3e:  mov    0x8(%r12),%rdi
   0x7fff2432fc43:  test   %rdi,%rdi
   0x7fff2432fc46:  je     0x7fff2432fc70
   0x7fff2432fc48:  testb  $0x1,0x3f78(%rdi)
   0x7fff2432fc4f:  je     0x7fff2432fc70
   0x7fff2432fc51:  mov    0x1b0(%r12),%rax
   0x7fff2432fc59:  test   %rax,%rax
   0x7fff2432fc5c:  je     0x7fff2432fc90
   0x7fff2432fc5e:  mov    0x18(%rbx),%rdi
   0x7fff2432fc62:  mov    %rbp,%rsi
   0x7fff2432fc65:  pop    %rbx
   0x7fff2432fc66:  pop    %rbp
   0x7fff2432fc67:  pop    %r12
   0x7fff2432fc69:  jmpq   *%rax
   0x7fff2432fc6b:  nopl   0x0(%rax,%rax,1)
   0x7fff2432fc70:  mov    0xb0(%r12),%rax
   0x7fff2432fc78:  test   %rax,%rax
--Type <RET> for more, q to quit, c to continue without paging--
   0x7fff2432fc7b:  jne    0x7fff2432fc5e
   0x7fff2432fc7d:  testb  $0x1,0x3f78(%rdi)
   0x7fff2432fc84:  je     0x7fff2432fc5e
   0x7fff2432fc86:  nopw   %cs:0x0(%rax,%rax,1)
   0x7fff2432fc90:  mov    (%r12),%rax
   0x7fff2432fc94:  lea    0x31e15(%rip),%rcx        # 0x7fff24361ab0
   0x7fff2432fc9b:  xor    %edx,%edx
   0x7fff2432fc9d:  mov    $0x1,%esi
   0x7fff2432fca2:  mov    (%rax),%r8
   0x7fff2432fca5:  xor    %eax,%eax
   0x7fff2432fca7:  callq  0x7fff2432f900
   0x7fff2432fcac:  mov    0x18(%rbx),%rdi
   0x7fff2432fcb0:  lea    0x10(%rbp),%rsi
   0x7fff2432fcb4:  callq  *0x50(%r12)
   0x7fff2432fcb9:  mov    0x8(%rbp),%rbx
   0x7fff2432fcbd:  lea    0x31edc(%rip),%rbp        # 0x7fff24361ba0
   0x7fff2432fcc4:  jmp    0x7fff2432fcdd
   0x7fff2432fcc6:  nopw   %cs:0x0(%rax,%rax,1)
   0x7fff2432fcd0:  testb  $0x4,0x3f78(%rdi)
   0x7fff2432fcd7:  jne    0x7fff2432fd08
   0x7fff2432fcd9:  mov    0x8(%rbx),%rbx
   0x7fff2432fcdd:  test   %rbx,%rbx
   0x7fff2432fce0:  je     0x7fff2432fd38
--Type <RET> for more, q to quit, c to continue without paging--
   0x7fff2432fce2:  cmpl   $0x3b9bdf5c,(%rbx)
   0x7fff2432fce8:  mov    0x8(%r12),%rdi
   0x7fff2432fced:  je     0x7fff2432fcd0
   0x7fff2432fcef:  mov    %rbp,%rcx
(gdb) b *0x7fff2432fc5e
Breakpoint 2 at 0x7fff2432fc5e
(gdb) c
Continuing.
[New Thread 0x7ffecaffd640 (LWP 2415)]

Thread 1 "RiseOfTheTombRa" hit Breakpoint 2, 0x00007fff2432fc5e in ?? ()
   from /usr/lib/libvulkan.so.1
(gdb) p/x $rdi
$1 = 0x7540e00
(gdb) x/b 0x3f78 + $rdi
0x7544d78:  0x00
(gdb)

... maybe I was confused, with when which functions should work, but looking at it again seems mesa behaves correctly. It depends on the requested extension/vulkan versions which functions mesa exposes. If the application requests vulkan 1.0 without the extension, both are not available, if the application requests vulkan 1.1, the non-KHR-variant is (only) available. If the application additionally requests the extension, in both cases the KHR-variant becomes available.

So it would be nice, if you run also this, so we can check which functions mesa returns on your system, with the extensions/api version requested by the game:

b anv_GetInstanceProcAddr if _instance != 0
y
r
p instance->physical_device_dispatch.entrypoints[anv_get_physical_device_entrypoint_index("vkGetPhysicalDeviceProperties2KHR")]
p instance->physical_device_dispatch.entrypoints[anv_get_physical_device_entrypoint_index("vkGetPhysicalDeviceProperties2")]
[zimudec@zimudec Rise of the Tomb Raider]$ STEAM_RUNTIME=0 GAME_LAUNCH_PREFIX="gdb --args" pvkrun ./RiseOfTheTombRaider.sh
WARNING: Rise of the Tomb Raider launched with STEAM_RUNTIME=0
         We recommend using the steam runtime if possible
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
PrimusVK: Searching for display GPU:
PrimusVK: 0x14db240: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x14db240.
PrimusVK: 0x130ee00.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider...
(No debugging symbols found in /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider)
(gdb) b anv_GetInstanceProcAddr if _instance != 0
Function "anv_GetInstanceProcAddr" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (anv_GetInstanceProcAddr if _instance != 0) pending.
(gdb) r
Starting program: /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
RiseOfTheTombRaider: crash reporter initialised with path "/home/zimudec/.local/share/feral-interactive/Rise of the Tomb Raider/crashes"
[New Thread 0x7fff3039d640 (LWP 2539)]
[New Thread 0x7fff27fff640 (LWP 2540)]
[New Thread 0x7fff2f54e640 (LWP 2541)]
[New Thread 0x7fff2ed4d640 (LWP 2542)]
[New Thread 0x7fff2e54c640 (LWP 2543)]
[New Thread 0x7fff2c893640 (LWP 2560)]
[New Thread 0x7fff277fe640 (LWP 2561)]
[New Thread 0x7fff26ffd640 (LWP 2562)]
[New Thread 0x7fff267fc640 (LWP 2563)]
[New Thread 0x7fff2573a640 (LWP 2564)]
SDL2 initialised [built against 2.0.7, running with 2.0.7]
[Detaching after fork from child process 2565]
[New Thread 0x7fff2506f640 (LWP 2566)]
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
[S_API FAIL] SteamAPI_Init() failed; unable to locate a running instance of Steam, or a local steamclient.so.
[Detaching after fork from child process 2567]
Setting breakpad minidump AppID = 391220
Steam_SetMinidumpSteamID:  Caching Steam ID:  76561198008468660 [API loaded no]
[New Thread 0x7fff2486e640 (LWP 2755)]
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[New Thread 0x7ffee7fff640 (LWP 2756)]
[New Thread 0x7ffee77fe640 (LWP 2757)]
[New Thread 0x7ffee6ffd640 (LWP 2758)]
[New Thread 0x7fff24477640 (LWP 2759)]
[New Thread 0x7ffee67fc640 (LWP 2760)]
[New Thread 0x7ffee5ffb640 (LWP 2761)]
[Thread 0x7ffee77fe640 (LWP 2757) exited]
[Thread 0x7ffee67fc640 (LWP 2760) exited]
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
[New Thread 0x7ffee67fc640 (LWP 2768)]
[New Thread 0x7ffee77fe640 (LWP 2769)]
[New Thread 0x7ffee4f86640 (LWP 2770)]
[New Thread 0x7ffecb015640 (LWP 2771)]
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "RiseOfTheTombRa" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) p instance->physical_device_dispatch.entrypoints[anv_get_physical_device_entrypoint_index("vkGetPhysicalDeviceProperties2KHR")]
No symbol "instance" in current context.
(gdb) p instance->physical_device_dispatch.entrypoints[anv_get_physical_device_entrypoint_index("vkGetPhysicalDeviceProperties2")]
No symbol "instance" in current context.
(gdb)
felixdoerre commented 3 years ago

Thanks for the gdb output. I believe I was able to reproduce the problem locally. The problem occurs when the mesa layer is inserted "below" primus-vk. On my system it was always inserted above primus-vk, so I didn't experience any problems. I forced the layer ordering and could reproduce the segfault.

Short story what is happens (with annotations about what happens with the fix):

Could you please try out the fix, I have pushed here: https://github.com/felixdoerre/primus_vk/tree/mesa_layer_fix ?

zimudec commented 3 years ago

Thanks for the gdb output. I believe I was able to reproduce the problem locally. The problem occurs when the mesa layer is inserted "below" primus-vk. On my system it was always inserted above primus-vk, so I didn't experience any problems. I forced the layer ordering and could reproduce the segfault.

Short story what is happens (with annotations about what happens with the fix):

  • When a Vulkan Instance is created, the application first calls the loader's VkCreateInstance function, enabling the extension in question

    • primus_vk's CreateInstance is invoked

    • mesa_device_select's CreateInstance is invoked, and recognizes the enabled extension

      • the call reaches the loader again in terminator_CreateInstance

      • The loader forwards this call to the drivers

    • primus_vk want to detect devices and calls EnumeratePhysicalDevices

    • mesa_device_select takes over this call and searches for the correct devices, trying to invoke GetPhysicalDeviceProperties2KHR (as it has seen that the extension is enabled)

      • the loader crashes, as it does not have the extension enabled yet
    • primus_vk would finish its CreateInstance method

  • the loader would countinue vkCreateInstance and enable the extension here
  • the application would call EnumeratePhysicalDevices

    • ( with the fix, primus_vk detects devices here, where the extension in the loader is enabled )
    • primus_vk would only return the dedicated GPU

Could you please try out the fix, I have pushed here: https://github.com/felixdoerre/primus_vk/tree/mesa_layer_fix ?

Great! how can i test this fix? primus I installed it (and install or update it) normally by pacman. The current version I have of pacman is: primus_vk 1.5-1

Should I download that version and compile to install the modified package like I did with mesa?

Why in your system are the layers loaded in a different order from mine?

felixdoerre commented 3 years ago

The layers are loaded in the order the the filesystem lists the corresponding json files. So this could be anything, from the order we installed packages in to filesystems structuring the on-disk layout of the directory contents differently.

Great! how can i test this fix?

I'd suggest the "quick and dirty" way: just clone this repository and checkout the corresponding branch, run make and copy libprimus_vk.so.1 to its system-wide place. (Probably /usr/lib/x86_64-linux-gnu/libprimus_vk.so.1). If you want to use the fix also for 32-bit application you would need to compile it for 32-bit as well.

If you really want a package I guess you could download https://github.com/archlinux/svntogit-community/blob/packages/primus_vk/trunk/PKGBUILD and adjust the versions/hashes in that file, but I don't really know how arch packaging works so I will probably only be of little help. The difference between this package and the mesa package, is that you want to really change the contents of the package and not just the configuration options, so it might be more difficult.

zimudec commented 3 years ago

The layers are loaded in the order the the filesystem lists the corresponding json files. So this could be anything, from the order we installed packages in to filesystems structuring the on-disk layout of the directory contents differently.

Great! how can i test this fix?

I'd suggest the "quick and dirty" way: just clone this repository and checkout the corresponding branch, run make and copy libprimus_vk.so.1 to its system-wide place. (Probably /usr/lib/x86_64-linux-gnu/libprimus_vk.so.1). If you want to use the fix also for 32-bit application you would need to compile it for 32-bit as well.

If you really want a package I guess you could download https://github.com/archlinux/svntogit-community/blob/packages/primus_vk/trunk/PKGBUILD and adjust the versions/hashes in that file, but I don't really know how arch packaging works so I will probably only be of little help. The difference between this package and the mesa package, is that you want to really change the contents of the package and not just the configuration options, so it might be more difficult.

I downloaded the zip from the url, entered and ran by terminal make, and it returned the following:

[zimudec@zimudec primus_vk-mesa_layer_fix]$ make
g++  --std=gnu++11 -g3 -I/usr/include/vulkan -shared -fPIC primus_vk.cpp -o libprimus_vk.so -Wl,-soname,libprimus_vk.so.1 -ldl -lpthread 
primus_vk.cpp:4:10: error fatal: vulkan.h: The file or directory does not exist
    4 | #include "vulkan.h"
      |          ^~~~~~~~~~
compilation finished.
make: *** [Makefile:19: libprimus_vk.so] Error 1

I installed some vulkan libraries for developers, and was able to compile. Now I do the tests.

EDIT:

When compiling it generated the file libprimus_vk.so instead of the mentioned libprimus_vk.so.1

How I should proceed?

felixdoerre commented 3 years ago

That's ok, copy libprimus_vk.so over /usr/lib/x86_64-linux-gnu/libprimus_vk.so.1. (The numbers behind those .so-files are just for versioning, so if primus_vk would introduce a "breaking change" you could have primus_vk version 1 and primus_vk version 2 installed at the same time)

zimudec commented 3 years ago

That's ok, copy libprimus_vk.so over /usr/lib/x86_64-linux-gnu/libprimus_vk.so.1. (The numbers behind those .so-files are just for versioning, so if primus_vk would introduce a "breaking change" you could have primus_vk version 1 and primus_vk version 2 installed at the same time)

haha I'm learning a lot here.

I copied the compiled file to /usr/lib/libprimus_vk.so.1

I ran the game command again, and it threw me the following:

[zimudec@zimudec Rise of the Tomb Raider]$ STEAM_RUNTIME=0 GAME_LAUNCH_PREFIX="gdb --args" pvkrun ./RiseOfTheTombRaider.sh
WARNING: Rise of the Tomb Raider launched with STEAM_RUNTIME=0
         We recommend using the steam runtime if possible
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
PrimusVK: Searching for display GPU:
PrimusVK: 0x1fd1700: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x1fd1700.
PrimusVK: 0x1fb7800.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider...
(No debugging symbols found in /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider)
(gdb) r
Starting program: /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
RiseOfTheTombRaider: crash reporter initialised with path "/home/zimudec/.local/share/feral-interactive/Rise of the Tomb Raider/crashes"
[New Thread 0x7fff3039c640 (LWP 2367)]
[New Thread 0x7fff27b9b640 (LWP 2368)]
[New Thread 0x7fff2f54d640 (LWP 2369)]
[New Thread 0x7fff2ed4c640 (LWP 2370)]
[New Thread 0x7fff2e54b640 (LWP 2371)]
[New Thread 0x7fff2c892640 (LWP 2372)]
[New Thread 0x7fff2739a640 (LWP 2373)]
[New Thread 0x7fff26b99640 (LWP 2374)]
[New Thread 0x7fff26398640 (LWP 2375)]
[New Thread 0x7fff25713640 (LWP 2376)]
SDL2 initialised [built against 2.0.7, running with 2.0.7]
[Detaching after fork from child process 2377]
[New Thread 0x7fff25056640 (LWP 2378)]
Setting breakpad minidump AppID = 391220
Steam_SetMinidumpSteamID:  Caching Steam ID:  76561198008468660 [API loaded no]
[New Thread 0x7fff24855640 (LWP 2379)]
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[New Thread 0x7ffee7fff640 (LWP 2380)]
[New Thread 0x7ffee77fe640 (LWP 2381)]
[Thread 0x7ffee77fe640 (LWP 2381) exited]
[New Thread 0x7ffee77fe640 (LWP 2382)]
[New Thread 0x7fff24466640 (LWP 2383)]
[New Thread 0x7ffee6ffd640 (LWP 2384)]
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
[New Thread 0x7ffee4d50640 (LWP 2386)]
[New Thread 0x7ffecffff640 (LWP 2387)]
[New Thread 0x7ffecf7fe640 (LWP 2388)]
[New Thread 0x7ffeceffd640 (LWP 2389)]
PrimusVK: Searching for display GPU:
PrimusVK: 0x6e50fd0: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x6e50fd0.
PrimusVK: 0x74cc210.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
free(): invalid pointer
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "RiseOfTheTombRa" received signal SIGABRT, Aborted.
0x00007ffff20b3615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff20b3615 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff209c862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff20f55e8 in __libc_message () at /usr/lib/libc.so.6
#3  0x00007ffff20fd27a in  () at /usr/lib/libc.so.6
#4  0x00007ffff20fe64c in _int_free () at /usr/lib/libc.so.6
#5  0x00007fff243384c6 in  () at /usr/lib/libvulkan.so.1
#6  0x00007fff2433ad38 in vkEnumeratePhysicalDevices ()
    at /usr/lib/libvulkan.so.1
#7  0x00000000019b315e in  ()
#8  0x00000000019b3868 in  ()
#9  0x00000000018f9844 in  ()
#10 0x00000000018f9c48 in  ()
#11 0x000000000047ef62 in  ()
#12 0x00000000004935f1 in  ()
#13 0x00000000004939ad in  ()
#14 0x00000000004aad61 in  ()
#15 0x00000000004aa281 in  ()
#16 0x000000000041d939 in  ()
#17 0x00007ffff209e152 in __libc_start_main () at /usr/lib/libc.so.6
#18 0x000000000046c3a1 in  ()
#19 0x00007fffffffe108 in  ()
#20 0x000000000000001c in  ()
#21 0x0000000000000001 in  ()
--Type <RET> for more, q to quit, c to continue without paging--
#22 0x00007fffffffe3e1 in  ()
#23 0x0000000000000000 in  ()
(gdb) c
Continuing.
[Thread 0x7fff25056640 (LWP 2378) exited]

Thread 1 "RiseOfTheTombRa" received signal SIGABRT, Aborted.
0x00007ffff20b3615 in raise () from /usr/lib/libc.so.6
(gdb) c
Continuing.
[Thread 0x7ffee6ffd640 (LWP 2384) exited]
RiseOfTheTombRaider: crash reporter failed

Thread 1 "RiseOfTheTombRa" received signal SIGABRT, Aborted.
0x00007ffff20b3615 in raise () from /usr/lib/libc.so.6
(gdb) c
Continuing.
Couldn't get registers: No existe el proceso.
Couldn't get registers: No existe el proceso.
(gdb) [Thread 0x7ffeceffd640 (LWP 2389) exited]
[Thread 0x7ffecf7fe640 (LWP 2388) exited]
[Thread 0x7ffecffff640 (LWP 2387) exited]
[Thread 0x7ffee4d50640 (LWP 2386) exited]
[Thread 0x7fff24466640 (LWP 2383) exited]
[Thread 0x7ffee77fe640 (LWP 2382) exited]
[Thread 0x7ffee7fff640 (LWP 2380) exited]
[Thread 0x7fff24855640 (LWP 2379) exited]
[Thread 0x7fff25713640 (LWP 2376) exited]
[Thread 0x7fff26398640 (LWP 2375) exited]
[Thread 0x7fff26b99640 (LWP 2374) exited]
[Thread 0x7fff2739a640 (LWP 2373) exited]
[Thread 0x7fff2c892640 (LWP 2372) exited]
[Thread 0x7fff2e54b640 (LWP 2371) exited]
[Thread 0x7fff2ed4c640 (LWP 2370) exited]
[Thread 0x7fff2f54d640 (LWP 2369) exited]
[Thread 0x7fff27b9b640 (LWP 2368) exited]
[Thread 0x7fff3039c640 (LWP 2367) exited]

Program terminated with signal SIGABRT, Aborted.
The program no longer exists.

The launcher did not start

felixdoerre commented 3 years ago

I don't really understand what goes wrong here. Probably we will need line numbers/debug symbols in libvulkan, to understand better why here an invalid call to free happens. So could you please choose the corresponding version for libvulkan from here: https://github.com/archlinux/svntogit-packages/commits/packages/vulkan-icd-loader/trunk and compile your vulkan-loader with debug symbols?

zimudec commented 3 years ago

I did the following:

1- I downloaded the same version that I have installed 2- I added options=(debug !strip) to the beginning of the PKGBUILD file 3- I ran makepkg to compile 4- I installed the compiled package

I ran the command to test tomb rider and I have the following:

[zimudec@zimudec Rise of the Tomb Raider]$ STEAM_RUNTIME=0 GAME_LAUNCH_PREFIX="gdb --args" pvkrun ./RiseOfTheTombRaider.sh
WARNING: Rise of the Tomb Raider launched with STEAM_RUNTIME=0
         We recommend using the steam runtime if possible
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
PrimusVK: Searching for display GPU:
PrimusVK: 0x16e9700: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x16e9700.
PrimusVK: 0x16cea60.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider...
(No debugging symbols found in /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider)
(gdb) run
Starting program: /home/zimudec/.local/share/Steam/steamapps/common/Rise of the Tomb Raider/bin/RiseOfTheTombRaider 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
RiseOfTheTombRaider: crash reporter initialised with path "/home/zimudec/.local/share/feral-interactive/Rise of the Tomb Raider/crashes"
[New Thread 0x7fff3039c640 (LWP 1651)]
[New Thread 0x7fff27b9b640 (LWP 1652)]
[New Thread 0x7fff2f54d640 (LWP 1653)]
[New Thread 0x7fff2ed4c640 (LWP 1654)]
[New Thread 0x7fff2e54b640 (LWP 1655)]
[New Thread 0x7fff2c892640 (LWP 1686)]
[New Thread 0x7fff2739a640 (LWP 1687)]
[New Thread 0x7fff26b99640 (LWP 1688)]
[New Thread 0x7fff26398640 (LWP 1689)]
[New Thread 0x7fff25713640 (LWP 1690)]
SDL2 initialised [built against 2.0.7, running with 2.0.7]
[Detaching after fork from child process 1692]
[New Thread 0x7fff25056640 (LWP 1705)]
[S_API FAIL] SteamAPI_Init() failed; SteamAPI_IsSteamRunning() failed.
[S_API FAIL] SteamAPI_Init() failed; unable to locate a running instance of Steam, or a local steamclient.so.
[Detaching after fork from child process 1706]
[Thread 0x7fff25056640 (LWP 1705) exited]
Setting breakpad minidump AppID = 391220
Steam_SetMinidumpSteamID:  Caching Steam ID:  76561198008468660 [API loaded no]
[New Thread 0x7fff24855640 (LWP 2020)]
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[S_API WARN] The loaded overlay DLL doesn't support ValveHookScreenshots
[New Thread 0x7fff25056640 (LWP 2021)]
[New Thread 0x7ffee7fff640 (LWP 2022)]
[New Thread 0x7ffee77fe640 (LWP 2023)]
[New Thread 0x7ffee6ffd640 (LWP 2024)]
[New Thread 0x7fff24466640 (LWP 2025)]
[New Thread 0x7ffee67fc640 (LWP 2026)]
[Thread 0x7ffee7fff640 (LWP 2022) exited]
[Thread 0x7ffee67fc640 (LWP 2026) exited]
INTEL-MESA: warning: Ivy Bridge Vulkan support is incomplete
[New Thread 0x7ffee67fc640 (LWP 2047)]
[New Thread 0x7ffee7fff640 (LWP 2048)]
[New Thread 0x7ffecbfff640 (LWP 2049)]
[New Thread 0x7ffec3fff640 (LWP 2050)]
PrimusVK: Searching for display GPU:
PrimusVK: 0x71a0fd0: 
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 4000 (IVB GT2)
PrimusVK:   Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x71a0fd0.
PrimusVK: 0x70c6480.
PrimusVK: Got discrete gpu!
PrimusVK: Device: GeForce GT 740M
PrimusVK:   Type: 2
free(): invalid pointer
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "RiseOfTheTombRa" received signal SIGABRT, Aborted.
0x00007ffff20b3615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff20b3615 in raise () at /usr/lib/libc.so.6
#1  0x00007ffff209c862 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff20f55e8 in __libc_message () at /usr/lib/libc.so.6
#3  0x00007ffff20fd27a in  () at /usr/lib/libc.so.6
#4  0x00007ffff20fe64c in _int_free () at /usr/lib/libc.so.6
#5  0x00007fff243384c6 in loader_instance_heap_free
    (pMemory=<optimized out>, instance=0x75e6920)
    at /usr/src/debug/Vulkan-Loader-1.2.151/loader/loader.c:169
#6  setupLoaderTrampPhysDevs (instance=instance@entry=0x75e6920)
    at /usr/src/debug/Vulkan-Loader-1.2.151/loader/loader.c:7039
#7  0x00007fff2433ad38 in vkEnumeratePhysicalDevices
    (instance=0x75e6920, pPhysicalDeviceCount=0x7fffffffd2ec, pPhysicalDevices=0x74cc170) at /usr/src/debug/Vulkan-Loader-1.2.151/loader/trampoline.c:694
#8  0x00000000019b315e in  ()
#9  0x00000000019b3868 in  ()
#10 0x00000000018f9844 in  ()
#11 0x00000000018f9c48 in  ()
#12 0x000000000047ef62 in  ()
#13 0x00000000004935f1 in  ()
#14 0x00000000004939ad in  ()
#15 0x00000000004aad61 in  ()
#16 0x00000000004aa281 in  ()
#17 0x000000000041d939 in  ()
--Type <RET> for more, q to quit, c to continue without paging--
#18 0x00007ffff209e152 in __libc_start_main () at /usr/lib/libc.so.6
#19 0x000000000046c3a1 in  ()
#20 0x00007fffffffe108 in  ()
#21 0x000000000000001c in  ()
#22 0x0000000000000001 in  ()
#23 0x00007fffffffe3e1 in  ()
#24 0x0000000000000000 in  ()
(gdb)

Is it correct?

felixdoerre commented 3 years ago

Yes, perfectly. However I still don't understand what's wrong. I think the loader should never go into loader.c:7039, because this line is used when physical devices were present in a previous invocation, but are missing now. This shouldn't be the case, as primus_vk should always consistently output the same single physical device. So we go into the next round. Could you please perform these gdb commands:

b loader.c:7029
run
bt
p total_count
p *new_phys_devs[0]
p local_phys_devs[0]
c
bt
p total_count
p *new_phys_devs[0]
p local_phys_devs[0]
c
... <repeat until crash>
felixdoerre commented 3 years ago

Good news! I had a hunch on what could be different on your setup vs. my setup and changed the order the drivers where loaded in. This allowed me to reproduce the new problem locally. This might be a vulkan-loader bug. I will open an issue there and ask about it.

Understanding the problem allowed me to create a workaround. I pushed it here: https://github.com/felixdoerre/primus_vk/tree/mesa_layer_fix Can you download the updated version and compile and test it?

zimudec commented 3 years ago

Good news! I had a hunch on what could be different on your setup vs. my setup and changed the order the drivers where loaded in. This allowed me to reproduce the new problem locally. This might be a vulkan-loader bug. I will open an issue there and ask about it.

Understanding the problem allowed me to create a workaround. I pushed it here: https://github.com/felixdoerre/primus_vk/tree/mesa_layer_fix Can you download the updated version and compile and test it?

I did the following:

Sure, starting steam with optirun, just with steam and adding the pvkrun parameter to the game. Both work :D

In this thread we apparently discovered a couple of bugs in libraries external to primus_vk, right?

felixdoerre commented 3 years ago

should I do something with the other compiled file? libnv_vulkan_wrapper.so You can install that file to /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1, however I didn't change anything in that component so it shouldn't make a difference.

Regarding bugs in other components:

In any case, I will merge the "mesa_layer_fix"-branch into master, so these problems should not occur anymore, even if the vulkan loader does not fix anything.

zimudec commented 3 years ago

should I do something with the other compiled file? libnv_vulkan_wrapper.so You can install that file to /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1, however I didn't change anything in that component so it shouldn't make a difference.

Regarding bugs in other components:

  • The behavior of the vulkan loader to override the gpu count. I've already opened an issue for the vulkan loader: KhronosGroup/Vulkan-Loader#510
  • The behavior of the mesa layer seems totally reasonable, so there is not bug there. The "lesson learned" could probably be documented somewhere, it is: "the function replacing vkCreateInstance must not call any vulkan functions except vkCreateInstance. This is caused by any vulkan-extensions activated through vkCreateInstance are only activated once the call to vkCreateInstance finishes." Maybe this could be seen as a bug in the vulkan loader, but I doubt that they will make any according adjustments. Maybe I will open a second issue after the first one got at least a bit of attention.

In any case, I will merge the "mesa_layer_fix"-branch into master, so these problems should not occur anymore, even if the vulkan loader does not fix anything.

Well, this was an extra setting for better compatibility :)

well, in addition to the above, with the steam-manjaro update, due to the primus problem (if I remember correctly), you cannot start steam with pvkrun directly, since it does not detect the dedicated gpu. But with optirun it works without problem, if not, start without optirun or pvkrun, and use pvkrun in the launch parameters for each game separately.

felixdoerre commented 3 years ago

well, in addition to the above, with the steam-manjaro update, due to the primus problem (if I remember correctly), you cannot start steam with pvkrun directly, since it does not detect the dedicated gpu. But with optirun it works without problem, if not, start without optirun or pvkrun, and use pvkrun in the launch parameters for each game separately.

I think we are mixing 3 different things here:

I believe the state for steam is that it breaks the activation of (OpenGL)-primus. When launching steam with ENABLE_PRIMUS_LAYER=1 primusrun, this consequently breaks powering on the GPU for primus_vk and both primus and primus_vk will not work. When launching steam with ENABLE_PRIMUS_LAYER=1 optirun, powering on the GPU is successful, as optirun does that "manually", however steam still breaks the activation of (OpenGL)-primus. So you should end up with primus_vk working, but (OpenGL)-primus broken.

And from the discussion in the steam-runtime repository I got that it only will get worse: When Steam (or individual games) will run inside a "pressure vessel" (a container) both (OpenGL)-primus and primus-vk can not work, as they both will not be installed/available from within the container. Additionally, also bumblebee is not available from within the container, so powering on the gpu from within the container is not possible. Getting that to work will require active work from the steam-runtime repository to "copy/install" primus and primus_vk inside the container and mount the bumblebee socket when it is detected outside, which will be hard. Also they seem to be very reluctant to fix (I'd sill call this behavior a bug, with or without primus and primus-vk) the situation for the current runtime-variant.

zimudec commented 3 years ago

well, in addition to the above, with the steam-manjaro update, due to the primus problem (if I remember correctly), you cannot start steam with pvkrun directly, since it does not detect the dedicated gpu. But with optirun it works without problem, if not, start without optirun or pvkrun, and use pvkrun in the launch parameters for each game separately.

I think we are mixing 3 different things here:

  • enabling power to the GPU. This is done either by activating (OpenGL)-primus (through primusrun) or by launching optirun.
  • activating (OpenGL)-primus. This is done by setting the LD_LIBRARY_PATH to contain primus' libGL.so.1. It requires "enabling power to the GPU" to work.
  • activating primus-vk. This is done by setting ENABLE_PRIMUS_LAYER=1. It requires at least "power to the GPU" to work.

I believe the state for steam is that it breaks the activation of (OpenGL)-primus. When launching steam with ENABLE_PRIMUS_LAYER=1 primusrun, this consequently breaks powering on the GPU for primus_vk and both primus and primus_vk will not work. When launching steam with ENABLE_PRIMUS_LAYER=1 optirun, powering on the GPU is successful, as optirun does that "manually", however steam still breaks the activation of (OpenGL)-primus. So you should end up with primus_vk working, but (OpenGL)-primus broken.

And from the discussion in the steam-runtime repository I got that it only will get worse: When Steam (or individual games) will run inside a "pressure vessel" (a container) both (OpenGL)-primus and primus-vk can not work, as they both will not be installed/available from within the container. Additionally, also bumblebee is not available from within the container, so powering on the gpu from within the container is not possible. Getting that to work will require active work from the steam-runtime repository to "copy/install" primus and primus_vk inside the container and mount the bumblebee socket when it is detected outside, which will be hard. Also they seem to be very reluctant to fix (I'd sill call this behavior a bug, with or without primus and primus-vk) the situation for the current runtime-variant.

What does it mean that ENABLE_PRIMUS_LAYER=1 optirun runs with vulkan without problems but with broken (OpenGL)-primus?

felixdoerre commented 3 years ago

What does it mean that ENABLE_PRIMUS_LAYER=1 optirun runs with vulkan without problems but with broken (OpenGL)-primus?

That means that applications that use Vulkan as graphics API will be accelerated through primus-vk (modern games or directx9/10/11/12 emulated through dxvk), and applications that use OpenGL (older applications or directx9/10/11 emulated through the implementation provided by wine) will run only on the integrated graphics card.

zimudec commented 3 years ago

What does it mean that ENABLE_PRIMUS_LAYER=1 optirun runs with vulkan without problems but with broken (OpenGL)-primus?

That means that applications that use Vulkan as graphics API will be accelerated through primus-vk (modern games or directx9/10/11/12 emulated through dxvk), and applications that use OpenGL (older applications or directx9/10/11 emulated through the implementation provided by wine) will run only on the integrated graphics card.

Interesting, I think I have not come across that case. But it is clear then that this can happen to me with optirun.

After this journey, what do we do with this issue? do we consider it solved?

felixdoerre commented 3 years ago

I think we have understood all behavior and have tried to report that to the corresponding repositories. I am not sure if they will fix/improve anything, but I think from primus_vk nothing more can be done and we should consider this ticket solved.