KhronosGroup / Vulkan-ValidationLayers

Vulkan Validation Layers (VVL)
https://vulkan.lunarg.com/doc/sdk/latest/linux/khronos_validation_layer.html
Other
738 stars 398 forks source link

Enormous overhead in synchronization validation #7285

Open charlie-ht opened 7 months ago

charlie-ht commented 7 months ago

Environment:

Describe the Issue

While running dEQP-VK.synchronization.basic.timeline_semaphore.chain in the CTS with validation, I saw a huge overhead of the layers. It takes 4 minutes to complete with the layers on, and 0.5 seconds with them off.

I took a perf trace to help investigate,

image

I should note it is with a Debug build of VVL. In release it takes 20s, which is still seems rather a lot compared to "normal". The test makes 32769 calls to vkQueueSubmit...

Expected behavior Test takes a few seconds.

artem-lunarg commented 7 months ago

@charlie-ht we made submit-time validation to be enabled by default recently. Regular sync validation does not include QueueSubmit validation. This should explain why you noticed this behavior. I agree, there are performance issues and it will take time to make progress in this direction. Will keep this issue open. If desired, submit time sync validation can be disabled in Vulkan Configurator.

DarioSamo commented 7 months ago

@charlie-ht we made submit-time validation to be enabled by default recently. Regular sync validation does not include QueueSubmit validation. This should explain why you noticed this behavior. I agree, there are performance issues and it will take time to make progress in this direction. Will keep this issue open. If desired, submit time sync validation can be disabled in Vulkan Configurator.

I've double checked and this setting is definitely not enabled. It's been pretty gradual that every update I've installed the performance regression has been pretty noticeable. It's not a deal breaker as obviously validation is not intended for end users, but I'm pretty much limited to only run the synchronization layer now if I suspect something is broken while before it could be used regularly, as the performance is too low to be usable. This is on a Ryzen 7950x FWIW, so it's not exactly a case of a weak system either.

We could probably investigate what versions exactly introduced the regressions but it's probably more worthwhile to look into optimizations of the current one, as I imagine the regressions come from just additional checking rather than a bug. Perhaps some of the operations just don't scale very well and it's not as obvious in a more limited test?

EDIT: For added context, the cases I'm referring to, unlike the OP, only have a couple of queue submissions per frame at most.

artem-lunarg commented 7 months ago

@DarioSamo I also have 7950x on my machine and can check the original use case during the month (unfortunately can't jump right now). Because the original use case has a lot of submissions there's a chance it's still related to Queue Submission Validation because it directly affects QueueSubmit calls - no need to use vkconfig app, if you enable sync validation from the code Queue Submit Validation will be enabled by default. Previously it was in alpha stage of development.

EDIT: For added context, the cases I'm referring to, unlike the OP, only have a couple of queue submissions per frame at most.

If you have a way to describe your use case and think it is worth to investigate it separately, feel free to provide details. Otherwise I will test dEQP-VK.synchronization.basic.timeline_semaphore.chain. One potential issue is that synchronization validation does not support timeline semaphores at the moment from what I know. It could be the reason why additional overhead exists that makes no sense otherwise. For example, I can imagine a situation where submission batches are not retired during validation because we can't track signal from timeline semaphores, but that's only speculation. I will check what's going on during investigation.

DarioSamo commented 7 months ago

If you have a way to describe your use case and think it is worth to investigate it separately, feel free to provide details.

Nope, I don't think I can describe it very well, as it's just something I noticed while working on Godot engine on multiple projects as I often use the synchronization layer to validate its behavior. I don't have any hard measurements here other than my gut feeling that it's been considerably getting slower. I can tell you at least in the cases of both projects, queue submission is used very sparingly per frame (two or three at most).

I've also noticed it on a personal project and there's a particular case that does a ton of render target switching that seems to trigger a pretty good slowdown when using the synchronization layer: that might make a good base for taking measurements and profiling it as it'll go from real time performance to pretty much half a second per frame at times when effects start happening.

If I get some more time later I could probably run a comparison against some older versions to see if I'm under a false impression, but with the recent update I looked at the issues list to see if someone else reported something already since I started to notice it was getting slower than usual. But like I said before, that doesn't necessarily mean it's an error, just that there's more validation being performed.

And for the record, either project isn't using timeline semaphores, just some plain semaphores and fences and very few of them (maybe we can consider looking into this separately from the issue the OP ran into).

cclao commented 4 months ago

KHR-GLES3.copy_tex_image_conversions.forbidden.renderbuffer_cubemap* also has similar performance issues, even with syncVal at queueSubmit is disabled. KHR-GLES3.copy_tex_image_conversions.forbidden.renderbuffer_cubemap_negx took 9 seconds with PipelineBarrier and VVL enabled. That time increases to 37 seconds if I uses VkEvent to track it. Without VVL it is about 4 seconds.