GPUOpen-Tools / gpu_performance_api

GPU Performance API for AMD GPUs
MIT License
250 stars 46 forks source link

Can't reuse DX12 CLs #82

Open Dragon31337 opened 2 months ago

Dragon31337 commented 2 months ago

I do not get how to reuse CLs with this lib. Whenever I tried to GpaBeginCommandList() on CL which called GpaBeginCommandList() AND GpaEndCommandList() I get en error "Command List already created". I tried both ways:

  1. Use one session for all counters I want with Passes number >1. Pass is reported as complete, command list was finished and executed.
  2. Multiple sessions when reported pass number is 1. Pass was completed, session completed, session deleted and recreated with a new set of counters.

Why can't I reuse CLs which I know were executed and completed? Requiring a new CL for each pass is kind of ridiculous, with all counters enabled Pass count was 26 for me.

PLohrmannAMD commented 2 months ago

It is true that CommandLists cannot be re-used when using GPUPerfAPI, and the reasoning is very subtle. When your application calls GpaBeginCommandList() GPUPerfAPI is calling into the driver to enable the specific hardware counters for the specified pass. Likewise, the GpaBeginSample() and GpaEndSample() are inserting commands to start and stop the counters and to read back the results. Therefore, if GPUPerfAPI allowed the application to re-use the completed command lists in the later passes, it would actually be re-collecting the same counters that were already collected in the first pass.

Due to this behavior, the command lists need to be re-created in each pass so that the correct hardware counters can be collected in that specific pass.

With all the counters enabled, your application will actually end up creating 26 seemingly identical command lists where the only difference is which counters are being enabled and profiled in each one.

If you're finding that this requirement is preventing you from using GPUPerfAPI or if you cannot easily re-work your code to work well with GPUPerfAPI directly, it is already integrated with Microsoft PIX, and you might find it easier to capture & replay with PIX to collect the counters and to benefit from some of their additional debugging and performance analysis features: https://devblogs.microsoft.com/pix/download/

Please let us know if you have additional questions. We're happy to help! 😄

Dragon31337 commented 2 months ago

Isn't the whole purpose of the internal CL ID returned by GpaBeginCommandList() to allow DX12 API CL reuse? The issue is all the data is tied to the API level CL ID instead of the internal one. If I reset a DX CL after execution, rerecord and execute again after samples collection counters won't be touched, I can still query counters and passes will be marked as completed. All buffers with collected counters should be tightened to internal CL ID, so tracking and querying should be no issue. Why would not I get a new internal ID with CL when I call GpaBeginCommandList() with the next pass number? DX12 CL is useless anyway as it's already reset and about to be rerecorded.

PIX does not work (capture/replay) with AGS calls.

PLohrmannAMD commented 2 months ago

Ah okay, yes, PIX does not currently work nicely with AGS. If you are associated with a game studio that has an AMD DevTech Engineer, please contact them to connect with me, and we may be able to suggest a workaround ("it depends").

No, the Internal CL ID is not there to allow CL re-use, but the GPUPerfAPI was originally developed long before DX12 existed. Keep in mind there is some shared functionality with our DX12, DX11, Vulkan, OpenGL, and OpenCL implementations, so there could be limitations from those APIs that are still being (incorrectly) imposed on DX12. Your comments do help me recognize that resetting / rerecording / re-executing a CL should be able to work correctly in theory, but right now would certainly run into the limitation that GPA will think you are trying to call GpaBeginCommandList() within the same non-reset CommandList.

Can you confirm that your application is natively trying to re-use the CL in this way? The CL is created once, and then being reset/rebuilt once for each frame? In this case, I believe you are right - the facts that the CL was already ended, and a new pass index is supplied, should be sufficient for GPA to treat the same API-level CL as a "new" one.

Or ... is it resetting the CL mid-frame and re-building it for re-use within the same frame? If this were to happen, the pass index will still be 1 for both calls to GpaBeginCommandList(). I guess GPA could know that the previous use of that CL had been ended, and that this call should be treated as a new instance. (This does seem potentially error-prone on the user-side, although we could output a message instead of an error to make this behavior more visible.)

Let me know if you agree with the statements and behavior I've described above. If you're inclined to make the change based on this repo and submit a PR, we're happy to do additional testing internally before merging in the change.

Dragon31337 commented 2 months ago

Yes, I got the error when CL was recorded, executed, reset, and now was recording the same set of commands when I called GpaBeginCommandList() with the next pass count. I was not resetting and restarting a pass mid-way. I think if the user did not execute and reset the CL before starting the next pass - that's a user mistake, and he would probably get just zeroes on the previous pass, as counter buffers won't be updated because CL was never executed.

Dragon31337 commented 2 months ago

I'm using public AGS lib from here: https://gpuopen.com/amd-gpu-services-ags-library/