GPUOpen-Tools / GPU-Reshape

GPU Reshape (GRS) is an API & vendor agnostic instrumentation framework, with instruction level validation.
Other
374 stars 12 forks source link

GRS vulkan layer causes a stackoverflow #45

Closed Ipotrick closed 8 months ago

Ipotrick commented 8 months ago

Hi, I just tried the GPUReshape app on my program and it caused my program to simply crash when launched from Reshape. At first i could not find a cause but later i enabled the global hook. This hook installs a global Vulkan layer that is always enabled. When i then launched my app outside of GPUReshape i got a proper asset catch within my IDE so i could see the problem. Here i am assuming, that the GRS vulkan layer is enabled for any app launched from GPUReshape.

When launched outside of Reshape with the global hook enabled, the app throws an error within the GRS vulkan layer complaining about a stack overflow. I think this could maybe be caused by a large descriptor set (containing 100000 buffers and 100000 image descriptors) i am using in my "bindless" vulkan backend. I had multiple tools stumble over my large descriptor sets already so i hope this would be a quick fix.

My app: https://github.com/Sunset-Flock/Timberdoodle I hope its not too inconvenient to build. it required vcpkg and the vulkan sdk installed.

miguel-petersen commented 8 months ago

Hi Ipotrick,

Thank you for a full repro case, it greatly helps reproducibility. I'll take a look over the coming days and get a fix out for you.

Ipotrick commented 8 months ago

I did some more testing on a test sample for my vulkan abstraction. This should be much easier to build, the test sample is also much simpler then my other app.

I made the descriptor set as large as in the other app and also populated it with a similar amount of descriptors. It does not crash the app when launched outside of Reshape unlike with my other app Timberdoodle. But it does crash the test sample when launched from Reshape directly.

I tested with the boids sample (important: this is not on the master branch): https://github.com/Ipotrick/Daxa/tree/a61b3177b787aaa2c9a5a5278580a06dd2911597

miguel-petersen commented 8 months ago

Seems the number of descriptors is a likely culprit. I'll see if I can reproduce it in a local sample I have built first.

Question, would you be able to build from source? This would help validate the fix is properly covering your use cases.

Ipotrick commented 8 months ago

I will try to build from source but i don't have time for it today. Tomorrow i will investigate more.

miguel-petersen commented 8 months ago

Hi Ipotrick, Small delay on getting to your issue, will see if I can't allocate some time tomorrow.

miguel-petersen commented 8 months ago

I seem to have an issue getting vcpkg to pick the right Vulkan version.

1> [CMake] vulkan:x64-windows -> 1.1.82.1#6 -- C:\Users\...\AppData\Local\vcpkg\registries\git-trees\ea62236a3c91051f5ccb340442b60a026bf160c6

Seems it's trying to find 1.1.82, however, that version has since been removed from the list of SDKs in the Khronos page. I tried searching around for an archived version, but not much luck. I am not that familiar with vcpkg, perhaps I'm doing something wrong?

1> [CMake] The Vulkan SDK wasn't found. Refer to Getting Started with the Windows 1> [CMake] Vulkan SDK:

miguel-petersen commented 8 months ago

Seems it was something odd with my machine, tried it on another. The configuration now succeeds, however I get internal compiler errors on the variant usage.

As a workaround I managed to get it building on a clean Windows (through Windows Sandbox, useful for quick testing). Not sure what the difference is.

miguel-petersen commented 8 months ago

What GPU & driver are you on?

I get a crash during shader module creation on a RX 6800 XT, the validation layers do not seem to complain.

miguel-petersen commented 8 months ago

Managed to get the sample running on an RTX 2070, unfortunately the App does not crash with GPU Reshape attached.

That said, there is an issue with the instrumentation. I'll use this issue going forward to track the issues. However, as I understand you have issues getting your App to run at all with Reshape?

If I cannot reproduce locally, I think the best way forward is getting the sources built on your side.

GabeRundlett commented 8 months ago

Yes, we both can not run our apps (Both made with the same rendering abstraction) through Reshape. Patrick is using an RTX 4080. I am using an RTX 3070

GabeRundlett commented 8 months ago

I managed to get the source building on my machine and launched the "Studio" target through Visual Studio, however I'm not sure what you would like for us to do.

GabeRundlett commented 8 months ago

Launching one of our apps from the debug session gives the following output in the console:

Crash detected, current frames:
        [00007FFC991BCD80] DebugBreak
                C:\Windows\System32\KERNELBASE.dll
        [00007FFB2E9320F0] Detail::Break
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Source\Assert.cpp line 54
        [00007FFB2E7E7360] SpvRecordReader::GetResult
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Include\Backends\Vulkan\Compiler\SpvRecordReader.h line 90
        [00007FFB2E7CC8E0] SpvPhysicalBlockTypeConstantVariable::Parse
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\Blocks\SpvPhysicalBlockTypeConstantVariable.cpp line 272
        [00007FFB2E7F7220] SpvPhysicalBlockTable::Parse
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvPhysicalBlockTable.cpp line 57
        [00007FFB2E693AE0] SpvModule::ParseModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvModule.cpp line 76
        [00007FFB2E6D4480] ShaderCompiler::InitializeModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 112
        [00007FFB2E6D46B0] ShaderCompiler::CompileShader
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 156
        [00007FFB2E6D4F40] ShaderCompiler::Worker
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 130
        [00007FFB2E6E58C0] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::operator()
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB2E6D5440] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::<lambda_invoker_cdecl>
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB2E89A550] Delegate<void __cdecl(void *)>::Invoke
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 48
        [00007FFB2E89AAD0] DispatcherWorker::ThreadEntry
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Dispatcher\DispatcherWorker.h line 68
        [00007FFB2E8971C0] std::invoke<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\type_traits line 1756
        [00007FFB2E8968A0] std::thread::_Invoke<std::tuple<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>,0,1>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\thread line 61        [00007FFB33462EE0] register_onexit_function
        [00007FFC9B102560] BaseThreadInitThunk
        [00007FFC9B76AA30] RtlUserThreadStart

Waiting for debugger to attach...
miguel-petersen commented 8 months ago

Hi Gabe. What would be invaluable for me, is if you could replicable the stack overflow or associated crash from your applications with Reshape enabled, and then describe where the fault is occurring (such as a call stack with locals).

EDIT: Posted just after your call stack, thanks!

GabeRundlett commented 8 months ago

Here's the call-stack and a little extra debug info shown from vscode, acquired by enabling the global hook and then launching my application in a debug session in VSCode.

I'm not sure if this local variable information is accurate, but if it is, the i being 4.2 billion is probably relevant ๐Ÿ˜… image

miguel-petersen commented 8 months ago

Definitely a weird i! ๐Ÿ˜…

My immediate guess is that memcpy overwrite it from the local stack allocation. Should be an easy fix, assuming it's that.

GabeRundlett commented 8 months ago

When stepping through it, it does in-fact get changed from 0 to 4.2 billion with that copy call. image image

miguel-petersen commented 8 months ago

From the previous call stack provided by the cmd, I think I'm handling forward pointers wrong. They don't allocate a spirv result id.

GabeRundlett commented 8 months ago

This looks a bit suspicious to me as well. image

miguel-petersen commented 8 months ago

Ah, copy & paste with a lack of coffee.

I'm currently building Reshape separately, will create a branch for all of these issues.

miguel-petersen commented 8 months ago

Is this separate from the boid sample? I don't hit that call stack in vkCmdWaitEvents2.

GabeRundlett commented 8 months ago

Let me run the boid sample. I was running my own Daxa app

GabeRundlett commented 8 months ago

Here is the console output when launching from GRS:

Crash detected, current frames:
        [00007FFB37E17710] SpvPhysicalBlockDebugStringSource::Parse
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\Blocks\SpvPhysicalBlockDebugStringSource.cpp line 91
        [00007FFB37E77220] SpvPhysicalBlockTable::Parse
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvPhysicalBlockTable.cpp line 56
        [00007FFB37D13AE0] SpvModule::ParseModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvModule.cpp line 76
        [00007FFB37D54480] ShaderCompiler::InitializeModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 112
        [00007FFB37D546B0] ShaderCompiler::CompileShader
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 156
        [00007FFB37D54F40] ShaderCompiler::Worker
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 130
        [00007FFB37D658C0] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::operator()
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB37D55440] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::<lambda_invoker_cdecl>
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB37F1A550] Delegate<void __cdecl(void *)>::Invoke
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 48
        [00007FFB37F1AAD0] DispatcherWorker::ThreadEntry
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Dispatcher\DispatcherWorker.h line 68
        [00007FFB37F171C0] std::invoke<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\type_traits line 1756
        [00007FFB37F168A0] std::thread::_Invoke<std::tuple<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>,0,1>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\thread line 61        [00007FFB33462EE0] register_onexit_function
        [00007FFC9B102560] BaseThreadInitThunk
        [00007FFC9B76AA30] RtlUserThreadStart

Waiting for debugger to attach...

HOWEVER, launching from vscode with the global hook does not appear to cause a crash... I tested and made sure I still do indeed get a crash when running my own application.

miguel-petersen commented 8 months ago

That crash just above should be fixed by ef2fa08

I wonder why the compiler is emitting zero length sources, a bug on my side regardless.

GabeRundlett commented 8 months ago

Yep that commit looks like it should fix it- I was just debugging to find the zero length source, and was just about to say so!

GabeRundlett commented 8 months ago

I pulled your changes, however it appears that somehow it's still triggering the same access violation... I'm now testing to see if I am having an issue with the build not rebuilding

miguel-petersen commented 8 months ago

Studio doesn't pull in the backend dependencies, since it technically doesn't depend on them (maybe a mistake). I suggest just building the solution.

(The backends are plugins, Studio doesn't have any knowledge of them)

GabeRundlett commented 8 months ago

Yeah that was the case! I built the solution and now a brand new crash ๐Ÿ˜ญ

Crash detected, current frames:
        [00007FFC991BCD80] DebugBreak
                C:\Windows\System32\KERNELBASE.dll
        [00007FFB319420F0] Detail::Break
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Source\Assert.cpp line 54
        [00007FFB317F7360] SpvRecordReader::GetResult
                C:\dev\downloads\GPU-Reshape\Bin\MSVC\Debug\GRS.Backends.Vulkan.Layer.dll
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Include\Backends\Vulkan\Compiler\SpvRecordReader.h line 90
        [00007FFB317DC8E0] SpvPhysicalBlockTypeConstantVariable::Parse
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\Blocks\SpvPhysicalBlockTypeConstantVariable.cpp line 272
        [00007FFB31807220] SpvPhysicalBlockTable::Parse
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvPhysicalBlockTable.cpp line 57
        [00007FFB316A3AE0] SpvModule::ParseModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\SpvModule.cpp line 76
        [00007FFB316E4480] ShaderCompiler::InitializeModule
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 112
        [00007FFB316E46B0] ShaderCompiler::CompileShader
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 156
        [00007FFB316E4F40] ShaderCompiler::Worker
                C:\dev\downloads\GPU-Reshape\Source\Backends\Vulkan\Layer\Source\Compiler\ShaderCompiler.cpp line 130
        [00007FFB316F58C0] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::operator()
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB316E5440] `Detail::DelegateCreator<void (__cdecl ShaderCompiler::*)(void *)>::MakeFrameProxy<{ShaderCompiler::Worker,0,0}>'::`2'::<lambda_1>::<lambda_invoker_cdecl>
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 81
        [00007FFB318AA550] Delegate<void __cdecl(void *)>::Invoke
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Delegate.h line 48
        [00007FFB318AAAD0] DispatcherWorker::ThreadEntry
                C:\dev\downloads\GPU-Reshape\Source\Libraries\Common\Include\Common\Dispatcher\DispatcherWorker.h line 68
        [00007FFB318A71C0] std::invoke<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\type_traits line 1756
        [00007FFB318A68A0] std::thread::_Invoke<std::tuple<void (__cdecl DispatcherWorker::*)(void),DispatcherWorker *>,0,1>
                C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\include\thread line 61        [00007FFB33462EE0] register_onexit_function
        [00007FFC9B102560] BaseThreadInitThunk
        [00007FFC9B76AA30] RtlUserThreadStart

Waiting for debugger to attach...
miguel-petersen commented 8 months ago

I'm working on that one now. ๐Ÿ™‚

miguel-petersen commented 8 months ago

It took me a second, was wondering on how to handle forward pointers and type maps. For now I settled on still keeping a reference to the "unresolved" forward pointer in my IL (hence the changes in pretty printing). I might revisit in the future.

Now working on an instrumentation issue.

Oh, and that zero debug source is giving us trouble. When you first inspect a shader it appears "empty", but if you click any other file or go to the IL view, it's all there. I need to filter it out somewhere. I really wonder what it is.

miguel-petersen commented 8 months ago

So the instrumentation issue is regarding how I change the push constant setup a bit. It is an error, but, it doesn't change the behaviour of your program, at least not the boid sample. If I ignore that validation error it is instrumenting the boid sample now.

I should note that bounds checking on buffer references, as compared to traditional buffers and textures, is currently not checked. There's definitely something to think about there, but I think I'll handle that in another issue.

GabeRundlett commented 8 months ago

Thank you very much for your help. Those few bug fixes appear to make the software run as intended with our applications. if I find any other issues, I guess I should make a different issue thread.

miguel-petersen commented 8 months ago

My pleasure Gabe. Yep, I think a separate issue thread is better. I'll get these changes onto a development branch, and then eventually into mainline.

(I understood this as both your issues are solved, particularly as I see you both on Daxa. If not, feel free to reopen)