GPUOpen-Tools / GPU-Reshape

GPU Reshape (GRS) is an API & vendor agnostic instrumentation framework, with instruction level validation.
Other
374 stars 12 forks source link

Crashes in GRS.Backends.DX12.Layer #47

Closed Nickelium closed 8 months ago

Nickelium commented 8 months ago

Hellow

Running GPU Reshape on my application, causes a crash in some GRS.Backends.DX12.Layer which happens right after D3D12CreateDevice(). This is the application which was tested on https://github.com/Nickelium/ComputePlayground.

Screenshot 2024-01-19 001733 Screenshot 2024-01-19 002357

I tried to build from source to see whats happening but the build process fails with following: Screenshot 2024-01-19 003508

Nickelium

miguel-petersen commented 8 months ago

Hi Nickelium,

It seems it's crashing on the Initialization feature setup. Thanks a lot for a repro case 🙂, I'll see to investigating your issue tomorrow.

With regards to the source build, do you have the .NET SDK installed? It is required to build the UI application, specifically Framework 4.8 (NET 5.0).

miguel-petersen commented 8 months ago

I am unable to reproduce the crash locally.

I'll continue trying to reproduce under different conditions. Could you list your machine specifications? GPU, Driver, CPU, etc.

The quickest way forward is probably a source build. However, I'll also see to that we get a debug build out with symbols for cases like these.

Nickelium commented 8 months ago

This is my .NET SDK version Screenshot 2024-01-20 234221

My specs CPU: Intel i7-6700K GPU: Nvidia GTX 970 GPU driver: 537.34, 546.65 (latest as of now) happens on both

When running Reshape and before the crash, I get the following warning: D3D12 WARNING: ID3D12ShaderBytecode::CreatePipelineState: Shader is corrupt or in an unrecognized format, or is not signed. Ensure that DXIL.dll is used to sign the shader. This shader and PSO containing it will not be validated. [ EXECUTION WARNING #1243: NON_RETAIL_SHADER_MODEL_WONT_VALIDATE]

miguel-petersen commented 8 months ago

I so happen to have a GTX 970 laying around, I'll plug it in to see if I can reproduce it with that (assuming it still works).

miguel-petersen commented 8 months ago

On the missing compiler, can you please launch your Visual Studio installer (2022) and make sure .NET desktop development is installed?

Additionally, could you please run "dotnet --info" in a local command prompt, and paste the output here?

Nickelium commented 8 months ago

As you suspected, installing the whole .NET desktop development through the installer, fixed the build process issue. Seems like installing .NET SDK separately was insufficient. Output: Screenshot 2024-01-21 105507

When running from source, I get to see the following: Screenshot 2024-01-21 104726 Asserting on following line: D3D12MA_ASSERT(0 && "Invalid pPoolDesc->HeapFlags passed to Allocator::CreatePool. Did you forget to handle ResourceHeapTier=1?");

Resource heap tier 1, is indeed what I get from querying device feature support

miguel-petersen commented 8 months ago

I think I have an idea what's happening. On that particular crash I can work around the lack of tier 2/3 support, however, for descriptor heaps it's a little more complicated.

Reshape performs descriptor injection to the end of heaps for instrumentation purposes, it doesn't need much but some. For less than 1 million slot heaps that's just fine, however for heaps with that and beyond it requires tier 3 to exceed that limit. So if you're not on tier 3, Reshape wont function correctly with heaps that hit that limit. Something I should report (in-app) in a readable manner!

For the purposes of this issue, do you create any heaps with 1 million descriptors?

miguel-petersen commented 8 months ago

@Nickelium With the above change (on the issue/47-resource-heap-tier-1 branch), I am able to instrument GPUOpen samples.

Would it be possible to build from source on your end, and see if it fixes your use case?

Nickelium commented 8 months ago

I'm not creating many descriptors so that sounds fine

Will try it today or tomorrow and let you know!

Nickelium commented 8 months ago

I didnt manage to build the branch as I get a linker error missing vulkan-1.lib. This is not related to this branch since I had same occurence on master except it was sufficient to launch the prebuild binary then attach the partial build from source to see the stacktrace. Screenshot 2024-01-23 234703

The ThirdParty/VulkanLoader doesnt contain vulkan-1.lib after building that external project which I think should, but it uses a module definition file which I'm not familiar with.

Any idea?

miguel-petersen commented 8 months ago

Yeah it should build the lib file, I haven't seen this issue before.

Would it be possible to build the loader separately and see what happens? https://github.com/KhronosGroup/Vulkan-Loader

And, would it be possible to get more of the log, particularly around the Vulkan Loader above?

Nickelium commented 8 months ago

The VulkanLoader build issue was due to some local changes I had that changes the directory on opening the cmd. With your change the issue is indeed gone

Thanks!

miguel-petersen commented 8 months ago

Very glad to hear 🙂

Closing this issue as completed, in case something pops up feel free to re-open it or create a new one.