GPUOpen-Tools / gpu_performance_api

GPU Performance API for AMD GPUs
MIT License
250 stars 46 forks source link

When is the correct time to call GpaCopySecondarySamples in Vulkan? #58

Open RenfengLiu opened 3 years ago

RenfengLiu commented 3 years ago

The document only states that for secondary command buffer we need to call this, but didn't state when is the correct time to call this. Should I call it after the vkCmdExecuteCommands for primary command buffer or vkEndCommandBuffer or any other places? Is there an example for this?

PLohrmannAMD commented 3 years ago

There should be a call to GpaBeginCommandList(vk_primary_cmd_buffer, gpa_primary_cmd_buffer), then the call to vkCmdExecuteCommands, and then GpaCopySecondarySamples(gpa_secondary_cmd_buffer, gpa_primary_cmd_buffer, <num samples>, <new sample IDs>) afterwards. This will ensure the sample results get copied to a secondary result buffer so they do not get overwitten by subsequent calls to vkCmdExecuteCommands.

When collecting sample results, the original sample Ids collected inside vkCmdExecuteCommands will not have proper results, and you should use the new sample IDs that were passed in to the GpaCopySecondarySamples call.

We do not currently have a sample of this for Vulkan, but the DX12ColorCube app has an example here: https://github.com/GPUOpen-Tools/gpu_performance_api/blob/30cd97819afd6f560a2dcd6847f5dde1fba08854/source/examples/dx12/dx12_color_cube/cube_sample.cc#L1090 Note: Unfortunately profiling of DX12 bundles does not currently work due to a change in the driver, but I believe the equivalent functionality should work in Vulkan. If you have any trouble getting the results, we'll be happy to help.

RenfengLiu commented 3 years ago

Thanks for the quick response.

I'm experimenting with the command_buffer_usage sample (with secondary command buffer) from https://github.com/KhronosGroup/Vulkan-Samples with the GPA libarary and the AMDVLK driver. I build the driver in debug mode. I was able to get counter values using GPA libaray from primary command buffer, but when trying with secondary command buffer, with the call sequence you mentioned I get the following asserts:

#1  0x00007fffefdf9c26 in Pal::CmdBuffer::WriteEvent (this=0x555557b1f5d0, gpuEvent=..., 
    pipePoint=Pal::HwPipeBottom, data=data@entry=3735928559)
    at /dir/driver/pal/src/core/cmdBuffer.cpp:853
#2  0x00007fffefccbe54 in Pal::CmdBuffer::CmdSetEvent (this=<optimized out>, gpuEvent=..., 
    setPoint=<optimized out>) at /dir/driver/pal/src/./core/cmdBuffer.h:494
#3  0x00007fffefdc067e in GpuUtil::GpaSession::CopyResults (this=0x555559b67248, pCmdBuf=0x555557b1f5d0)
    at /dir/driver/pal/src/gpuUtil/gpaSession.cpp:2176
#4  0x00007fffefc3fd53 in vk::GpaSession::CmdCopyResults (this=<optimized out>, pCmdBuf=<optimized out>)
    at /dir/driver/xgl/icd/api/vk_gpa_session.cpp:314
#5  0x00007fffefc3fe2e in vk::entry::vkCmdCopyGpaSessionResultsAMD (commandBuffer=<optimized out>, 
    gpaSession=<optimized out>) at /dir/driver/xgl/icd/api/vk_gpa_session.cpp:432
#6  0x00007fffee55297d in VkGpaCommandList::CopySecondarySamples (this=0x5555586fc820, 
    primary_command_list=0x555557249ec0, num_samples=55, new_sample_ids=0x555558efdbd0, 
    original_sample_ids=std::vector of length 55, capacity 64 = {...})
    at /dir/third_party/gpu_performance_api/source/gpu_perf_api_vk/vk_gpa_command_list.cc:245
#7  0x00007fffee558ed4 in VkGpaPass::CopySecondarySamples (this=0x7fffdc04c5b0, 
    secondary_vk_gpa_command_list=0x5555586fc820, primary_vk_gpa_command_list=0x555557249ec0, num_samples=55, 
    new_sample_ids=0x555558efdbd0)
    at /dir/third_party/gpu_performance_api/source/gpu_perf_api_vk/vk_gpa_pass.cc:319
#8  0x00007fffee55ac21 in VkGpaSession::CopySecondarySamples (this=0x7fffdc04e950, 
    secondary_command_list_id=0x5555582e6280, primary_command_list_id=0x5555582a8b40, num_samples=55, 
    new_sample_ids=0x555558efdbd0)
    at /dir/third_party/gpu_performance_api/source/gpu_perf_api_vk/vk_gpa_session.cc:77
#9  0x00007fffee41a151 in GpaCopySecondarySamples (secondary_gpa_command_list_id=0x5555582e6280, 
    primary_gpa_command_list_id=0x5555582a8b40, number_of_samples=55, new_sample_ids=0x555558efdbd0)
    at /dir/third_party/gpu_performance_api/source/gpu_perf_api_common/gpu_perf_api.cc:1394

specificly:

#1  0x00007fffefdf9c26 in Pal::CmdBuffer::WriteEvent (this=0x555557b1f5d0, gpuEvent=..., pipePoint=Pal::HwPipeBottom, data=data@entry=3735928559) at /dir/driver/pal/src/core/cmdBuffer.cpp:853
853             PAL_ASSERT_ALWAYS();

Do you have any idea on what may be the problem here?

PLohrmannAMD commented 3 years ago

It appears the change that affected DX12 bundles also affected the support in Vulkan, as this is now part of the shared codebase. The secondary command lists are now handled in a different manner within the driver and the change caused our profiling extensions to not work properly. Unfortunately you will not be able to easily get this working. I will raise the priority of this with our driver teams.

As an alternative GPUPerfAPI is already integrated into RenderDoc. If you use RenderDoc to capture and profile applications with secondary command lists, I believe it will work correctly. Most capture / replay tools will record the calls that are inside the secondary command list, and then substitute them in place of the VkCmdExecuteCommands call. Since the calls are now actually being replayed on the primary command list, the profiling is able to work correctly.

Sorry for the inconvenience, and hopefully you can get what you need via RenderDoc.