ValveSoftware / gamescope

SteamOS session compositing window manager
Other
2.99k stars 198 forks source link

Corruption with Gallium Nine + radeonsi #768

Open icecream95 opened 1 year ago

icecream95 commented 1 year ago

Any applications using Gallium Nine, for example https://github.com/axeldavy/Xnine gives corruption that looks like this:

Screenshot_20230130_125330

This happens with both gamescope 3.11.49 and the latest version compiled from git (ecee87b15794). Note that occasionally the corruption does not appear. Applications using OpenGL or Vulkan work fine.

Hardware and driver versions:

03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Lucienne [1002:164c] (rev c2)

OpenGL renderer string: AMD Radeon Graphics (renoir, LLVM 15.0.7, DRM 3.49, 6.1.7-200.fc37.x86_64)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 22.3.3

When running the application with MESA_LOADER_DRIVER_OVERRIDE=zink this problem does not appear, so it is radeonsi-specific.

Running it with AMD_DEBUG=nodcc or AMD_DEBUG=nodisplaydcc also works around the issue.

Joshua-Ashton commented 1 year ago

This is likely because Nine doesn't use modifiers on it's buffers. I don't think there is anything for us to do there.

There isn't really any reason to use Nine given DXVK exists, if there is a bug, you can let me know there.

icecream95 commented 1 year ago

Are you sure that gamescope is doing nothing wrong?

When I run it with a debug build of Mesa, it crashes on startup, with this assertion failure that seems related:

gamescope: ../src/amd/common/ac_surface.c:2318: gfx9_compute_surface: Assertion `!ac_modifier_has_dcc(surf->modifier) || !(surf->flags & RADEON_SURF_DISABLE_DCC)' failed.

Thread 1 "gamescope" received signal SIGABRT, Aborted.
0x00007ffff74afe5c in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff74afe5c in __pthread_kill_implementation () at /lib64/libc.so.6
#1  0x00007ffff745fa76 in raise () at /lib64/libc.so.6
#2  0x00007ffff74497fc in abort () at /lib64/libc.so.6
#3  0x00007ffff744971b in _nl_load_domain.cold () at /lib64/libc.so.6
#4  0x00007ffff7458656 in  () at /lib64/libc.so.6
#5  0x00007fffed973fdf in gfx9_compute_surface (addrlib=0x5555556c0850, info=0x555555867810, config=0x7fffffffc9b0, mode=RADEON_SURF_MODE_2D, surf=0x555555895b18) at ../src/amd/common/ac_surface.c:2318
#6  0x00007fffed97492c in ac_compute_surface (addrlib=0x5555556c0850, info=0x555555867810, config=0x7fffffffc9b0, mode=RADEON_SURF_MODE_2D, surf=0x555555895b18) at ../src/amd/common/ac_surface.c:2498
#7  0x00007fffed8fa12d in radv_amdgpu_winsys_surface_init (_ws=0x5555558676e0, surf_info=0x7fffffffca30, surf=0x555555895b18) at ../src/amd/vulkan/winsys/amdgpu/radv_amdgpu_surface.c:95
#8  0x00007fffed828a38 in radv_image_create_layout (device=0x555555974950, create_info=..., mod_info=0x0, image=0x5555558959e0) at ../src/amd/vulkan/radv_image.c:1686
#9  0x00007fffed829b52 in radv_image_create (_device=0x555555974950, create_info=0x7fffffffcbb0, alloc=0x555555897ae0, pImage=0x555555897cd0, is_internal=false) at ../src/amd/vulkan/radv_image.c:1926
#10 0x00007fffed82b280 in radv_CreateImage (_device=0x555555974950, pCreateInfo=0x555555897b38, pAllocator=0x555555897ae0, pImage=0x555555897cd0) at ../src/amd/vulkan/radv_image.c:2450
#11 0x00007fffed920077 in vk_tramp_CreateImage (device=0x555555974950, pCreateInfo=0x555555897b38, pAllocator=0x555555897ae0, pImage=0x555555897cd0) at src/vulkan/runtime/vk_dispatch_trampolines.c:829
#12 0x00007fffed8fecb9 in wsi_create_image (chain=0x555555897a90, info=0x555555897b38, image=0x555555897cd0) at ../src/vulkan/wsi/wsi_common.c:676
#13 0x00007fffed90d935 in wsi_wl_image_init (chain=0x555555897a90, image=0x555555897cd0, pCreateInfo=0x7fffffffceb0, pAllocator=0x555555974990) at ../src/vulkan/wsi/wsi_common_wayland.c:1651
#14 0x00007fffed90e341 in wsi_wl_surface_create_swapchain (icd_surface=0x555555648e10, device=0x555555974950, wsi_device=0x555555866058, pCreateInfo=0x7fffffffceb0, pAllocator=0x555555974990, swapchain_out=0x7fffffffce18)
    at ../src/vulkan/wsi/wsi_common_wayland.c:1888
#15 0x00007fffed8ff6b7 in wsi_CreateSwapchainKHR (_device=0x555555974950, pCreateInfo=0x7fffffffceb0, pAllocator=0x0, pSwapchain=0x555555613330) at ../src/vulkan/wsi/wsi_common.c:916
#16 0x00007fffed921b45 in vk_tramp_CreateSwapchainKHR (device=0x555555974950, pCreateInfo=0x7fffffffceb0, pAllocator=0x0, pSwapchain=0x555555613330) at src/vulkan/runtime/vk_dispatch_trampolines.c:1381
#17 0x00007ffff7aae731 in terminator_CreateSwapchainKHR () at /lib64/libvulkan.so.1
#18 0x000055555559ceb0 in vulkan_make_swapchain(VulkanOutput_t*) [clone .constprop.0] ()
#19 0x0000555555563f9a in main ()
Joshua-Ashton commented 1 year ago

As I said, Nine doesn't allocate images for scanout with modifiers, which is what that happens. This isn't our bug.

icecream95 commented 1 year ago

The crash is an unrelated bug then?

Joshua-Ashton commented 1 year ago

No it's the exact same bug.

icecream95 commented 1 year ago

It happens while starting gamescope. It is not specific to Gallium Nine.

Older versions of Mesa are not affected (latest main is broken, but 22.3.3 is not), so it could be a radv bug.

Joshua-Ashton commented 1 year ago

Oh hm, that is interesting then. Does it work with SDL_VIDEODRIVER=x11?

icecream95 commented 1 year ago

No, it still asserts.

I think this is a RADV bug.

These lines in radv_use_dcc_for_image_early disable DCC when VK_IMAGE_USAGE_STORAGE_BIT is set...

   /*
    * TODO: Enable DCC for storage images on GFX9 and earlier.
    *
    * Also disable DCC with atomics because even when DCC stores are
    * supported atomics will always decompress. So if we are
    * decompressing a lot anyway we might as well not have DCC.
    */
   if ((pCreateInfo->usage & VK_IMAGE_USAGE_STORAGE_BIT) &&
       (device->physical_device->rad_info.gfx_level < GFX10 ||
        radv_formats_is_atomic_allowed(device, pCreateInfo->pNext, format, pCreateInfo->flags)))
      return false;

...but by that point the modifier is already chosen by radv_select_modifier, which trusts that all of the modifiers it gets are valid for the image.

So then while radv_use_dcc_for_image_early disables DCC, radv_select_modifier chooses a DCC modifier and the assertion fires.

The only place where the list of modifiers is filtered is wsi_configure_native_image in Mesa. That function uses vkGetPhysicalDeviceImageFormatProperties2 to check that the modifier is valid. That function calls radv_get_image_format_properties and so radv_check_modifier_support, but there are no checks for the storage usage bit. So DCC modifiers are not filtered out, which is why radv_select_modifier uses one.

This patch will filter out DCC modifiers, allowing gamescope to work again.

diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
index 09146e38baa..159a064f215 100644
--- a/src/amd/vulkan/radv_formats.c
+++ b/src/amd/vulkan/radv_formats.c
@@ -1349,6 +1349,11 @@ radv_check_modifier_support(struct radv_physical_device *dev,
                       VK_IMAGE_CREATE_SPARSE_ALIASED_BIT))
       return VK_ERROR_FORMAT_NOT_SUPPORTED;

+   if (info->usage & VK_IMAGE_USAGE_STORAGE_BIT &&
+       dev->rad_info.gfx_level < GFX10 &&
+       ac_modifier_has_dcc(modifier))
+      return VK_ERROR_FORMAT_NOT_SUPPORTED;
+
    /*
     * Need to check the modifier is supported in general:
     * "If the drmFormatModifier is incompatible with the parameters specified