bkaradzic / bgfx

Cross-platform, graphics API agnostic, "Bring Your Own Engine/Framework" style rendering library.
https://bkaradzic.github.io/bgfx/overview.html
BSD 2-Clause "Simplified" License
14.59k stars 1.92k forks source link

Vulkan: improve staging data performance by using scratch buffers per frame. #3295

Closed mcourteaux closed 1 week ago

mcourteaux commented 1 month ago

Opening this PR to have a discussion on the work I did to improve performance of streaming data in the Vulkan backend. Will put as draft for now.

Summary

Staging scratch buffers are created at init to use for transfering data to the device. These scratch buffers are thus allocated once and reused for the lifetime of the renderer. This prevents allocating and freeing buffers for this purpose all the time (likely every several per frame). When the scratch buffer is full, the old procedure of simply creating a buffer for this purpose is used. This functionality is implemented in the auxiliary function allocFromScratchStagingBuffer, which will return a struct with info on which buffer to use and at what offset, and if this was a custom-made buffer that needs to be freed afterwards.

Minor changes:

Added config:

Related issues:

Result

Overall, I see a 3x speedup of simple bgfx::frame() times in the profiling data. UpdateDynamicVertexBuffer has a speedup of 30x almost! CommandQueueVK::finish() almost 8x faster.

A few before/after profiling screenshots for regular SilverNode usage (simply navigating UI and rerendering UI every frame with vg-renderer): Yellow is with this PR using scratch buffers, red is before.

These zones were profiled thanks to additional profiled scopes (also in this PR). Now most of the runtime of the vulkan backend is actually timed if the profiler is enabled: image

bkaradzic commented 4 weeks ago

Can you test a little bit examples with your changes? One of first examples I tried with your branch crashed...

../../../src/renderer_vk.cpp  (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp (7976): BGFX Allocate command buffer error: vkWaitForFences failed -4: VK_ERROR_DEVICE_LOST.
../../../src/renderer_vk.cpp (4303): ASSERT VK_SUCCESS == vkresult -> m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (4303): BGFX FATAL 0x00000000: m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 4 weeks ago

Aha, I don't get any crashes, but I do get the warning if I run the examples in Debug mode indeed. I get a loop that looks like this:

../../../src/renderer_vk.cpp (7549): BGFX vkWaitForFences( device , 1 , &m_backBufferFence[m_backBufferColorIdx] , 1U , (18446744073709551615UL) )
../../../src/renderer_vk.cpp (3806): BGFX vkAllocateDescriptorSets(m_device, &dsai, &descriptorSet)
../../../src/renderer_vk.cpp (3806): BGFX vkAllocateDescriptorSets(m_device, &dsai, &descriptorSet)
../../../src/renderer_vk.cpp (4731): BGFX vkFlushMappedMemoryRanges(device, 1, &range)
../../../src/renderer_vk.cpp (4731): BGFX vkFlushMappedMemoryRanges(device, 1, &range)
../../../src/renderer_vk.cpp (8044): BGFX vkEndCommandBuffer(m_activeCommandBuffer)
../../../src/renderer_vk.cpp (8049): BGFX vkResetFences(device, 1, &m_completedFence)
../../../src/renderer_vk.cpp (8067): BGFX vkQueueSubmit(m_queue, 1, &si, m_completedFence)
../../../src/renderer_vk.cpp (4303): BGFX m_cmd.alloc(&m_commandBuffer)
../../../src/renderer_vk.cpp (683): BGFX ---E-           Semaphore, Validation, 0: Validation Error: [ VUID-vkAcquireNextImageKHR-semaphore-01779 ] Object 0: handle = 0xb3ee8b0000000070, type = VK_OBJECT_TYPE_SEMAPHORE; | MessageID = 0x5717e75b | vkAcquireNextImageKHR():  Semaphore must not have any pending operations. The Vulkan spec states: If semaphore is not VK_NULL_HANDLE it must not have any uncompleted signal or wait operations pending (https://vulkan.lunarg.com/doc/view/1.3.280.0/linux/1.3-extensions/vkspec.html#VUID-vkAcquireNextImageKHR-semaphore-01779)

I'll investigate later. I don't have any time today.

mcourteaux commented 3 weeks ago

@bkaradzic This semaphore bug on vkAcquireNextImageKHR is also on master. I guess we will need to fix that one first before being able to continue testing this. You can compile bgfx in debug with examples, and you'll see the issue when launching example-01-cubesDebug.

bkaradzic commented 3 weeks ago

@mcourteaux I don't see any issue with master... I run regularly thru examples.

mcourteaux commented 3 weeks ago

I'm writing up an issue with my analysis. I think I'm wrapping my head around this issue, and I think I'll have a fix soonish as well. Will link the issue once I post it.

bkaradzic commented 3 weeks ago

Make sure you update your drivers before writing that analysis... It's unlikely you're first to discover something so trivial by running 01-cubes example. @pezcode Please, could you test on your end master?

mcourteaux commented 3 weeks ago

@bkaradzic #3302 I posted the issue, but I might keep updating it for a while, but I think it's detailed enough to accurately communicate the problem. I am on latest NVIDIA driver. The issue at hand is a race condition.

mcourteaux commented 3 weeks ago

@bkaradzic Does the crash you were seeing with my PR disappear if you change kick() on the last line of ContextRendererVK::submit() to kick(true)?

bkaradzic commented 3 weeks ago

Nope...

../../../src/renderer_vk.cpp (6013): BGFX Texture   1: BGRA8 (requested: BGRA8), 1024x1024x1 RT[ ], BO[ ], CW[ ].
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
wb: signal: trace/breakpoint trap (core dumped)

/.../bgfx-vk-staging|vulkan-staging-buffer-arena⚡* > git diff
diff --git a/src/renderer_vk.cpp b/src/renderer_vk.cpp
index e7dc43495..945fd9dfb 100644
--- a/src/renderer_vk.cpp
+++ b/src/renderer_vk.cpp
@@ -9334,7 +9334,7 @@ VK_DESTROY
                        }
                }

-               kick();
+               kick(true);
        }

 } /* namespace vk */ } // namespace bgfx
/.../bgfx-vk-staging|vulkan-staging-buffer-arena⚡* > 
mcourteaux commented 3 weeks ago

Aha, right, that's because the fence is already waited for. For a quick and dirty test, you could delete the failing vkWaitForFences on line 8073, because it is always already waited for in case of kick(true).

Fences can be waited for twice if I read the docs well.

mcourteaux commented 3 weeks ago

Update: I have found two bugs with this:

  1. Texture data does not meet alignment requirements when using the scratch staging buffer.
  2. Potential Synchronization issue described in https://github.com/bkaradzic/bgfx/pull/3303#issuecomment-2149589478

Will fix, and let know.

mcourteaux commented 3 weeks ago

@d-s-h Can you test this branch, to see if it helps with your performance issue described in #3225? If you do, please try to use the bgfx Debug version. Also note that if you have bgfx callbacks configured, you must implement the traceVargs method to actually print something; otherwise no errors or warnings will be visible.

mcourteaux commented 3 weeks ago

@bkaradzic Would you like to test this again, now that you have updated to the latest Vulkan SDK, and the sync bug is fixed? Hopefully it works, or at least you might get some new error messages. Still testing it here without issues.

mcourteaux commented 2 weeks ago

@bkaradzic Update on this? Did your crashes disappear after we fixed the semaphore bug; and I fixed the texture alignment bug?

bkaradzic commented 2 weeks ago

With your latest changes (from your branch):

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x799f384bcdd0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 2 weeks ago

Thanks a lot for this update! I'll look into it. It's sort of annoying that we can't know for sure when it's ready... I don't get these validation errors. Perhaps we ask @pezcode to review the code once it at least works on your machine too.

mcourteaux commented 2 weeks ago

@bkaradzic I pushed a change. For me the examples still work. The alignment requirement for copying into an image buffer says that it must be a multiple of the "texel block size". Before, I was using the "bytes per pixel" value as alignment. Now I'm using bimg::getBlockInfo()::blockSize as alignment. Does that sound right to you?

If this still doesn't fix the issue, can you tell me which example you are running? I tried example-08-update which deals with a whole lot of texture format, but I wasn't getting this issue.

I updated my Vulkan SDK and I'm getting another validation error on example-08-update, but this issue also exists on master currently:

../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xec25c90000000093, name = TH 13: textures/texture_compression_bc1.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x8f5f070000000095, name = TH 14: textures/texture_compression_bc2.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (2). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (2). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07971 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x9a6a1fcf | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.x (0) and (imageExtent.width + imageOffset.x) (4) must be >= zero or <= image subresource width (1). The Vulkan spec states: For each element of pRegions, imageOffset.x and (imageExtent.width + imageOffset.x) must both be greater than or equal to 0 and less than or equal to the width of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07971)
../../../src/renderer_vk.cpp (754): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-imageSubresource-07972 ] Object 0: handle = 0x729670faf2a0, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0xe150c50000000097, name = TH 15: textures/texture_compression_bc3.ktx, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x12f1c9cc | vkCmdCopyBufferToImage(): pRegions[0] imageOffset.y (0) and (imageExtent.height + imageOffset.y) (4) must be >= zero or <= image subresource height (1). The Vulkan spec states: For each element of pRegions, imageOffset.y and (imageExtent.height + imageOffset.y) must both be greater than or equal to 0 and less than or equal to the height of the specified imageSubresource of dstImage (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-imageSubresource-07972)
bkaradzic commented 2 weeks ago

With latest (I'm running 01-cubes):

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x7e3c104bcf90, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 2 weeks ago

I'm terribly sorry. I was compiling with make config=release64 and then running the debug example. When you do something so obviously silly for more than a few days, you don't even question it.

I found a bug with my code, for which I'm also sorry. I fixed this bug too now. This is getting ridiculous (I'm really sorry for wasting so much time of yours). However I did test it several times with additional robustness tests and it seems to correctly align the allocations into the scratch buffer for me. The bug I found was dumb, but as far as I can tell, it wouldn't explain the misalignment.

Q: Am I misunderstanding what bx::strideAlign does? I'm using it to round up the current offset in the buffer to the desired alignment.

bkaradzic commented 2 weeks ago

Tested again, now 01, 04, 09 crash everything in between works.

bx::strideAlign returns next aligned offset by stride. Templated version finds next offset that's aligned by _stride and with Min template argument.

bkaradzic commented 2 weeks ago

All crashes look like this:

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x79b8bc2fb050, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 2 weeks ago

@bkaradzic I took the time to do some additional debugging with Valgrind. I found a few minor issues, which I fixed. Regarding the crash, I added a bunch of BX_ASSERTs and some printing. Could you please generate a crash, and share the entire output of one of the crashing programs? Feel free to also debug yourself, as really did what I could. With some luck, you'll hit one of the asserts I added, which might make debugging easier.

bkaradzic commented 2 weeks ago

Still looks the same...

../../../src/renderer_vk.cpp (683): BGFX ---E-       CommandBuffer, Validation, 0: Validation Error: [ VUID-vkCmdCopyBufferToImage-dstImage-07975 ] Object 0: handle = 0x7ef70c4bd030, type = VK_OBJECT_TYPE_COMMAND_BUFFER; Object 1: handle = 0x4e4775000000006e, type = VK_OBJECT_TYPE_IMAGE; | MessageID = 0x35cfc6de | vkCmdCopyBufferToImage(): pRegions[0].bufferOffset (326) must be a multiple of VK_FORMAT_B8G8R8A8_UNORM texel size (4). The Vulkan spec states: If dstImage does not have either a depth/stencil format or a multi-planar format, then for each element of pRegions, bufferOffset must be a multiple of the texel block size (https://vulkan.lunarg.com/doc/view/1.3.283.0/linux/1.3-extensions/vkspec.html#VUID-vkCmdCopyBufferToImage-dstImage-07975)
../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp(8073): ASSERT VK_SUCCESS == vkresult -> vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (8073): BGFX FATAL 0x00000000: vkWaitForFences(device, 1, &m_completedFence, 1U, (18446744073709551615UL)); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 2 weeks ago

Yes, but I didn't change anything, besides adding printing and asserts. Can you share the full output?

mcourteaux commented 2 weeks ago

So, the 326 the validation layer reports for you should be impossible, as I use strideAlign to round that up to the next multiple of the blockSize for the given TextureFormat, which is supposed to be 4 (as reported by both my added print statements, and the validation layer).

mcourteaux commented 2 weeks ago

@bkaradzic To be precise, here I added asserts and prints:

https://github.com/bkaradzic/bgfx/blob/52e38148ae4f76e9103fd4383f85edc12f2336b6/src/renderer_vk.cpp#L6496-L6511

This is the only vkCmdCopyBufferToImage in the entire file. The validation layer is complaining that the bufferOffset is not correctly aligned to the desired texel block size.

You should at least see those prints appear as well.

bkaradzic commented 2 weeks ago

I'm testing with 01-cubes, it doesn't hit this code.

mcourteaux commented 2 weeks ago

Just to be sure: you are not making any weird mistake where you are not running the code you were compiling (like I was doing)?

Can you construct a stacktrace from where the validation layer reports this? I'm flabbergasted. A breakpoint on line 671 for example would catch this. (For gdb, this would be: break -source renderer_vk.cpp -line 671).

bkaradzic commented 2 weeks ago

Rebuilt whole thing and it actually works now... Stale code.

mcourteaux commented 2 weeks ago

Oh wauw, that makes sense! I'll clean up the PR and get rid of some of the excessive printing. Would this be a bug in GENie or the dependency detection system?

bkaradzic commented 2 weeks ago

Not sure what happened, makefile should not have issues...

mcourteaux commented 2 weeks ago

@bkaradzic @pezcode Ready for review/merge as far as I'm concerned.

bkaradzic commented 2 weeks ago

47-pixelformats

../../../src/renderer_vk.cpp (683): BGFX ---E-              Device, ../src/amd/vulkan/radv_device.c:595, 0: GPU hung triggered by other process (VK_ERROR_DEVICE_LOST)
../../../src/renderer_vk.cpp (8017): BGFX Allocate command buffer error: vkWaitForFences failed -4: VK_ERROR_DEVICE_LOST.
../../../src/renderer_vk.cpp(4309): ASSERT VK_SUCCESS == vkresult -> m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
../../../src/renderer_vk.cpp (4309): BGFX FATAL 0x00000000: m_cmd.alloc(&m_commandBuffer); VK error 0xfffffffc: VK_ERROR_DEVICE_LOST
mcourteaux commented 2 weeks ago

Oh good catch! I reproduced this one. I was unaware of the conversion process. I was aligning to the texture format of the original format instead of the format that bimg converts the texture to. Is fixed now. Please be aware of the bug I reported in #3311. If you suffer this issue with example-08, you can try this patch:

diff --git a/examples/08-update/update.cpp b/examples/08-update/update.cpp
index 109efb0ff..200c92e20 100644
--- a/examples/08-update/update.cpp
+++ b/examples/08-update/update.cpp
@@ -173,18 +173,11 @@ bgfx::TextureHandle loadTextureWithUpdate(const char* _filePath, uint64_t _flags
                                        , NULL
                                        );

-                               const bimg::ImageBlockInfo& blockInfo = getBlockInfo(imageContainer->m_format);
-                               const uint32_t blockWidth  = blockInfo.blockWidth;
-                               const uint32_t blockHeight = blockInfo.blockHeight;
-
                                uint32_t width  = imageContainer->m_width;
                                uint32_t height = imageContainer->m_height;

                                for (uint8_t lod = 0, num = imageContainer->m_numMips; lod < num; ++lod)
                                {
-                                       width  = bx::max(blockWidth,  width);
-                                       height = bx::max(blockHeight, height);
-
                                        bimg::ImageMip mip;

                                        if (bimg::imageGetRawData(*imageContainer, 0, lod, imageContainer->m_data, imageContainer->m_size, mip))

Side question: BX_ASSERT and friends macro doesn't deal well with conditions that contain the modulo operator: the modulo disappears as it's treated as a format specifier. In:

https://github.com/bkaradzic/bx/blob/0ec634e8fdf8c810f9911c686a8158088ae25379/include/bx/macros.h#L285-L292

bx::assertFunction(bx::Location::current(), "ASSERT " #_condition " -> " _format, ##__VA_ARGS__)

could be turned into:

bx::assertFunction(bx::Location::current(), "ASSERT %s -> " _format, #_condition, ##__VA_ARGS__)

For now, I replaced the % operator with bx::uint32_mod() to avoid this printing issue.