LunarG / gfxreconstruct

Graphics API Capture and Replay Tools for Reconstructing Graphics Application Behavior
https://vulkan.lunarg.com/doc/sdk/latest/linux/capture_tools.html
MIT License
400 stars 114 forks source link

[tocpp] Failure to convert 3DMark trace on Android #1628

Open PixelyIon opened 1 month ago

PixelyIon commented 1 month ago

A bit of background, I'm a driver developer for Mesa3D Turnip and I've been trying to convert a trace of 3DMark Wild Life Extreme recorded on Android to C++ code, so it's easier to analyze the impact of separate parts of a single frame since we have a performance delta relative to other drivers here. I'm aware that the tool is in alpha, so I don't expect it to work, but due to its very high potential to be useful, I thought I'd give it a shot.

I was able to get a trace that replayed successfully on the device, but when attempting to use it with tocpp with an Android target, it ended up crashing while processing the first frame. I compiled gfxrecon (58b31d9620d62c8de35dfe2697e6c30bdc61c1ac) manually and ran it with a debugger, as it turned out it was crashing inside VulkanCppConsumer::Process_vkEnumeratePhysicalDeviceGroups due to a nullptr dereference when attempting to dereference the pointers returned by pPhysicalDeviceGroupProperties's GetPointer() or GetMetaStructPointer() since both of these returned nullptrs. I worked around this by just adding a check to return early from the function if these returned null, just to see if that would work.

This worked until it hit VulkanCppConsumer::Process_vkGetPhysicalDeviceSurfaceFormats2KHR where there was a similar issue fixed with checking if either pSurfaceInfo or pSurfaceFormats had null pointers. After doing that, it did actually finish going through all of the frames up till the limit I set of 100.

Unfortunately, the generated output had a lot of issues, the primary one being that the physical device was seemingly always VK_NULL_HANDLE which is likely due to TODO: Support physicalDevices (output with array length value?) argument. I got around this by just manually adding a call to vkEnumeratePhysicalDevices to grab the physical device handle and replacing all instances of the physical device being as VK_NULL_HANDLE with the variable containing the valid handle.

There was another issue with VulkanCppConsumerBase::GenerateLoadData outputting paths with \ slashes on Windows which were being parsed as a part of escape characters, replacing all instances of \ with / before writing the path out fixed this. Finally, there was GenerateStruct_VkMemoryAllocateFlagsInfo which wrote usages of variables can_use_opaque_address and uses_opaque_address which weren't defined by Generate_vkAllocateMemory in the path where it cannot find an opaque handle.

After all of these fixes, I got it to run without any errors but the screen was entirely black. I tried to debug the virtual swapchain code but everything seemed completely fine, so I just tried getting another trace which I captured by replaying the original trace with --remove-unsupported --sfa --opcd --sync -m rebind --swapchain virtual. On using this trace, I got the image attached below which looks nothing like what it's supposed to be but since it moves that means that frames are rendering something and the virtual swapchain potentially works but that was the end of my investigation into it since it'd be fairly time consuming to track the bug any further.

image

Reproducing the trace

KarenGhavam-lunarG commented 1 month ago

Not surprised that you may be having issues. The Tocpp tool is in alpha quality and we are quite sure it has many bugs and gaps. See documentation here: https://github.com/LunarG/gfxreconstruct/tree/dev/tools/tocpp

This most likely will not be triaged in a timely manner.

PixelyIon commented 1 month ago

I definitely understand the current state of tocpp and have no expectations of being addressed anytime soon, but I wanted to document a case where its broken and some of the fixes that were made along the way for when someone wants to take a look at it.