Closed HuYuxin closed 6 months ago
Is there any plan to address this in the near future?
sorry, I can add to my plate for the week
Thank you! No worries, just want to follow-up so that we can plan accordingly on our side.
(making notes) I think the way forward on this is to track all returns of VK_ERROR_DEVICE_LOST
and from there un-mark all the objects as "being used" in object tracker
@HuYuxin I tried looking at this a bit more, it is hard for me to reproduce, I wasn't able to get Angle built for Android locally, will try again tomorrow
Thank you @spencer-lunarg for the work! Please let me know if you need help with building ANGLE for Android.
Is the change the solution to the issue? I tried applying the change, but the test dEQP?EGL.functional.robustness.reset_context.shaders.out_of_bounds_non_robust.reset_status.writes.local_array.fragment
still failed, with the same VVL error. Example error message: [ VUID-vkDestroyBuffer-buffer-00922 ] Validation Error: [ VUID-vkDestroyBuffer-buffer-00922 ] | MessageID = 0xe4549c11 | vkDestroyBuffer(): can't be called on VkBuffer 0x1a000000001a[] that is currently in use by VkCommandBuffer 0x780cab6ad0[]
.
Hi @spencer-lunarg can I ask for an update on this ticket. Just want to check-in if you need any help from us (getting the ANGLE build for Android running, reproducing the issue, etc) to push this ticket forward?
So first apologize, I have some time now since we just got the SDK branch done to look at this now
I tried running ./angle_deqp_egl_tests --gtest_filter=dEQP-EGL.functional.robustness.reset_context.shaders.out_of_bounds_non_robust.reset_status.writes.local_array.fragment --verbose --local-output --num-retries=0 --skip-clear-data
on my Linux RADV Mesa machine but don't see the issue. I have an Android 13 Pixel device, it just will take some time to build and get that whole env setup working
In the mean time, I think I was able to reproduce this with the MockICD on Linux by having a way to "force" a device lost... let me try that first, but the "core" issue is just adding better "Device Lost" support in the Validation Layers
@HuYuxin so I think I got it working... I just merged #7715
can you confirm this fixes everything
Thank you @spencer-lunarg for working on this within your busy schedules.
I applied your change, most of the original VVL error is gone, except one VVL is still being thrown:
[ VUID-vkDestroyCommandPool-commandPool-00041 ] Validation Error: [ VUID-vkDestroyCommandPool-commandPool-00041 ] Object 0: handle = 0x7a58aba050, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x20000000002, type = VK_OBJECT_TYPE_COMMAND_POOL; | MessageID = 0xad474cda | vkDestroyCommandPool(): (VkCommandBuffer 0x7a58aba050[]) is in use. The Vulkan spec states: All VkCommandBuffer objects allocated from commandPool must not be in the pending state (https://www.khronos.org/registry/vulkan/specs/1.3-extensions/html/vkspec.html#VUID-vkDestroyCommandPool-commandPool-00041)
Can this be fixed with a follow-up change?
Regarding repro the problem, not all vulkan driver will end up with device lost when there is write out of bounds access in fragment shader, which is probably why you don't see it on Linux RADV Mesa machine. Which Pixel device do you have? Would you be able to download and flash the Android image for the Pixel device you have from https://developers.google.com/android/images? w.r.t. ANGLE, I can provide you with an ANGLE test apk that you can directly use without building ANGLE from scratch.
@HuYuxin I see I missed the VkCommandPool
... I can quickly fix that now
for the "reproduce case", while it is nice to have something on Android, I really want something that we will test in CI
Our CI runs with a MockICD driver and I added logic to have it return a DEVICE_LOST
when I want, that way I can catch regressions
Environment:
Describe the Issue
A clear and concise description of what the bug is.
When VVL processes
vkDestroy*()
calls, it doesn't seem to cover the case when the vulkan device is lost and the vulkan resources can't finish execution on the GPU due to device loss.To reproduce:
Follow ANGLE Development Setup to get ANGLE source code.
Follow Setting up the ANGLE build for Android to download Android build dependency and set up GN args for building Android target. Make sure the vulkan validation layer is enabled by adding below line to the GN arg:
Expected behavior
Test passes without the VVL error message
Valid Usage ID VUID-vkDestroyFence-fence-01120, VUID-vkDestroyPipeline-pipeline-00765, VUID-vkDestroyBuffer-buffer-00922, VUID-vkDestroyRenderPass-renderPass-00873, VUID-vkDestroyBuffer-buffer-00922, VUID-vkDestroyImageView-imageView-01026, VUID-vkDestroyCommandPool-commandPool-00041
Additional context
In the reproduce example, application calls
vkDestroy*()
to clean up all the resources after the vulkan device is lost. According to the spec: When a device is lost, its child objects are not implicitly destroyed and their handles are still valid. Those objects must still be destroyed before their parents or the device can be destroyed (see the Object Lifetime section).. This means that if the vulkan device is lost, the application should still be able to destroy the vulkan objects, even if the vulkan commands have not finished execution yet due to vulkan device lost.In short, can we add a vulkan device lost check when processing
vkDestroy*()
calls, and not throw theVUID-vkDestroy*
errors if the vulkan device is already lost?code or terminal output
```sh # callstacks, crashes, etc. # EX: Validation Error: [ VUID-vkCmdDrawMultiEXT-colorAttachmentCount-06188 ] Object 0: handle = 0x3d47e60 ... ```