Closed lunarpapillo closed 10 months ago
A lot of these have Unexpected: Validation Error: [ VUID-VkPhysicalDeviceGroupProperties-sType-sType ]
Which is something I ran into, the issue is 99% the VK_LAYER_MESA_device_select
layer being old. Updating the layer to the newest version should fix all of these I think
https://gitlab.freedesktop.org/mesa/mesa/-/commit/4588453815c58ec848b0ff6f18a08836e70f55df
also related https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/4674 seems NODEVICE_SELECT
is one way to get around this (cc @juan-lunarg)
NODEVICE_SELECT=1
fixes some, but not all of the tests... the remaining two failures (on the latest code, which includes a fix for the hang in PresentIdWait
) are:
VkLayerTest.ValidateImportMemoryHandleType
VkLayerTest.PresentIdWait
I'll see about either getting a newer VK_LAYER_MESA_device_select
, or talk to @juan-lunarg about getting NODEVICE_SELECT=1
set.
for VkLayerTest.ValidateImportMemoryHandleType
seems that either buffer_export.init_no_mem(*m_device, buffer_info);
or memory_buffer_export.init(*m_device, alloc_info);
is failing in the tests and causing the vkBindBufferMemory
to fail with a VK_NULL_HANDLE
passed in
VkLayerTest.PresentIdWait
appears to be a driver bug from Nvidia. I worked on this with Charles yesterday and it seems the extension is just broken on Linux.
it seems the extension is just broken on Linux.
on all Linux, or just NVIDIA Linux?
on all Linux, or just NVIDIA Linux?
NVIDIA Linux not sure about other GPUs
I just talked with @charles-lunarg and @juan-lunarg about how to handle this in CI. We discussed:
VK_LAYER_MESA_device_select
, even if the version installed is not the same as the system default
NODEVICE_SELECT=1
all the time in CI (or, even stronger, always disable all implicit layers in CI)
VK_LAYER_MESA_device_select
(or other implicit layers) and validation layersNODEVICE_SELECT=1
in vk_layer_validation_tests
just for those tests affected by it, disabling it when it's not needed
device_select
errors if they appear
Looking for insights and other alternatives from the VVL developers...
Are these problems bugs in VK_LAYER_MESA_device_select? If so could we file an Issue/PR to get them fixed?
In the short term, it seems like the 2nd or 3rd solution is 'best', since testing in non-user configuration will eventually cause much confusion.
Are these problems bugs in VK_LAYER_MESA_device_select? If so could we file an Issue/PR to get them fixed?
At least for https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/4674 the issue has already been fixed in mesa
Not sure about all of these tests however.
The bugs have been fixed in some version of Mesa... but hasn't propagated to the default in Ubuntu yet, and I'm not sure when it will.
But that does bring up a fourth alternative, to ensure that the VK_LAYER_MESA_device_select
layer is at least some "known good" version on all CI systems... (someone remind me what that version number is, again?)... I'll edit the above to add the fourth.
Assuming the device_select layer is there due to iGfx being present, would disabling iGfx from the bios fix the issue?
On Fri, Oct 28, 2022 at 3:33 PM Bob Ellison @.***> wrote:
The bugs have been fixed in some version of Mesa... but hasn't propagated to the default in Ubuntu yet, and I'm not sure when it will.
But that does bring up a fourth alternative, to ensure that the VK_LAYER_MESA_device_select layer is at least some "known good" version on all CI systems... (someone remind me what that version number is, again?)... I'll edit the above to add the fourth.
— Reply to this email directly, view it on GitHub https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/4716#issuecomment-1295487090, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXAUZ3A7TFKGJ3K2UU3PCLWFRBA5ANCNFSM6AAAAAARPXGDQI . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>
-- Tony Barbour LunarG @. @.>
Assuming the device_select layer is there due to iGfx being present,
I don't think @johnzupin deliberately installed an Intel driver, so I'm thinking it would appear in any Ubuntu 22.04 installation... but John can hopefully provide more insight.
would disabling iGfx from the bios fix the issue?
Even if it did, I'd consider this a last resort. I don't think typical end users do this, and I'd like CI machines to reflect typical configuration whenever possible. I've seen this sort of fix lead to issues that end users see that cannot be detected in CI... and I'm hoping one day to be able expand CI to be able to run against the Intel drivers too.
Disabling the graphics in the bios wouldn't fix the issue, unless the 'bad layer' is removed as a part of the driver installation/removal process. The layer is its own thing and I don't think the layer has any mechanism to check for bios level settings.
for
VkLayerTest.ValidateImportMemoryHandleType
seems that eitherbuffer_export.init_no_mem(*m_device, buffer_info);
ormemory_buffer_export.init(*m_device, alloc_info);
is failing in the tests and causing thevkBindBufferMemory
to fail with aVK_NULL_HANDLE
passed in
memory_buffer_import
init fails.
I verified that the PresentIdWait
issue is definitely an Nvidia bug. It fails to pass ALL relevant CTS tests.
VkLayerTest.ValidateImportMemoryHandleType
does look like an issue on our side.
See PR https://github.com/KhronosGroup/Vulkan-ValidationLayers/pull/4748
VkLayerTest.ValidateImportMemoryHandleType
is now fixed.
Closing since CI has been updated: https://github.com/LunarG/VulkanTests/commit/bb6c84f4875439a67c51b75e58e5690303b8ce20
Waitaminnit... I thought these issues remained until all the blacklisted tests are either fixed or encoded into the internal VVL blacklist...? I show several tests still failing:
VkLayerTest.TestBindBufferMemoryDeviceGroup
VkLayerTest.DuplicatePhysicalDevices
VkLayerTest.InvalidImageCreateFlagWithPhysicalDeviceCount
VkLayerTest.TransferImageToSwapchainWithInvalidLayoutDeviceGroup
VkLayerTest.InvalidDeviceMask
VkPositiveLayerTest.TransferImageToSwapchainDeviceGroup
VkPositiveLayerTest.ImagelessLayoutTracking
Waitaminnit... I thought these issues remained until all the blacklisted tests are either fixed or encoded into the internal VVL blacklist...? I show several tests still failing:
Apologies I didn't understand the protocol. I made an incorrect assumption.
Unexpectedly, NODEVICE_SELECT=1
breaks Vulkan-ExtensionLayer tests and gfxreconstruct tests. These may indicate device-ordering dependencies in these repositories. Waiting to see if @jeremyg-lunarg or GFXR engineers have any insights.
If we can't set NODEVICE_SELECT=1
for all Linux CI, we can set it just for VVL.
This changes sets the variable just for VVL: https://github.com/LunarG/VulkanTests/pull/405
I spent time last week and https://github.com/LunarG/VulkanTests/pull/483 should reset these to things I find are fixable
also we fixed teh NODEVICE_SELECT
issue in https://github.com/LunarG/VulkanTests/pull/482
This machine will replace an Ubuntu 16.04 machine with the same GPU.
8 failed tests
VkLayerTest.TestBindBufferMemoryDeviceGroup
VkLayerTest.ValidateImportMemoryHandleType
VkLayerTest.DuplicatePhysicalDevices
VkLayerTest.InvalidImageCreateFlagWithPhysicalDeviceCount
VkLayerTest.TransferImageToSwapchainWithInvalidLayoutDeviceGroup
VkLayerTest.InvalidDeviceMask
VkPositiveLayerTest.TransferImageToSwapchainDeviceGroup
VkPositiveLayerTest.ImagelessLayoutTracking
1 hanging test
VkLayerTest.PresentIdWait
For reference, on Ubuntu 16.04, most of these tests passed (two were skipped): http://erusea:8080/job/Vulkan-ValidationLayers/9191/BITS=64,BUILD_MODE=Release,USE_ROBIN_HOOD_HASHING=OFF,label=Aurelia-Linux-Nvidia/artifact/vulkantest-results/execution-logs/009-vk_layer_validation_tests-info.txt
The full test logfile is attached; it could be useful for determining what went wrong with any particular test: blacklist.txt