GPUOpen-Tools / GPU-Reshape

GPU Reshape (GRS) is an API agnostic instrumentation framework, with instruction level validation.
Other
359 stars 10 forks source link

exit with error code 0xc000041d when using wgpu #59

Closed caxieyou closed 3 months ago

caxieyou commented 4 months ago

Description exe exit with nothing but an message "process didn't exit successfully: xxx.exe (exit code: 0xc000041d)"

Repro steps Compile ok, but exit after several seconds

Expected vs observed behavior expected to see some triangle on screen, but only a white screen.

Extra materials I tried to debug it with the pdb file + the exe in VS2019, got this screenshot, hope can be helpful image

Switch backend to DX12, works fine. Send my exe to other people's computer also works fine. really confusing!

Platform wgpu 17(tried 18, same issue) + windows 10 + RTX3070 + vulkan 1.3.275.0 installed + latest NVidia driver installed

miguel-petersen commented 4 months ago

Hi @caxieyou , thanks for the bug report.

Would it be possible to build the development branch and see if it's already been fixed on your end?

Meanwhile I'll see if I can reproduce it on a public sample.

miguel-petersen commented 3 months ago

Hi @caxieyou, sorry for the delay.

I reproduced the issue locally, just needed to add support for Vulkan imageless framebuffers, seems stable now. Relevant changes are now in the development branch.

If you're able to build it, that'd be great, if not I understand, it'll be part of the next release. 🙂

Elabajaba commented 3 months ago

I just built and ran the development branch (commit ad23b7268ce85687bb97bfaa0003d1e2fa8aca53) and tested it with both wgpu (trunk b731495e053fe3a15879ff2637c2fb8db74ace3e, shadow example cargo run --example wgpu-examples -- shadow) and bevy (main 8a0882534815bb96327f137a2d0549bee8af217c, uses wgpu 0.19.3, which is the latest stable release, 3d_scene example cargo run --example 3d_scene).

Running either of those under vulkan results in major corruption when launched from GPU Reshape. (running them standalone works, connecting to the standalone examples after startup also works). Dx12 seems to work correctly (you can set the WGPU_BACKEND environment variable to dx12 or vulkan to change the API wgpu uses)

images showing the corruption Bevy `3d_scene` example Basic, synchronous recording: ![image](https://github.com/GPUOpen-Tools/GPU-Reshape/assets/177631/60da94a9-dd89-486f-96a6-2f7229999c2d) Concurrency, detailed reporting: ![image](https://github.com/GPUOpen-Tools/GPU-Reshape/assets/177631/d0b3d9f0-50e0-4c73-9547-60d43eb10e96) What it looks like when I just run the example standalone: ![image](https://github.com/GPUOpen-Tools/GPU-Reshape/assets/177631/3cfa1317-f2b5-45e5-810c-8e12e0f2958a) wgpu `shadow` example, concurrency ![image](https://github.com/GPUOpen-Tools/GPU-Reshape/assets/177631/8775fa74-ce65-4730-99b3-1ee28a3e4274)
miguel-petersen commented 3 months ago

Hi @Elabajaba. Thanks for the swift testing.

I've been connecting to after the fact, not launching from Reshape, I'll see what's going on.

Are you instrumenting on launch? If so, what instrumentation features do you have enabled? (All, Basic, etc.)

Elabajaba commented 3 months ago

Hi @Elabajaba. Thanks for the swift testing.

I've been connecting to after the fact, not launching from Reshape, I'll see what's going on.

Are you instrumenting on launch? If so, what instrumentation features do you have enabled? (All, Basic, etc.)

I wrote what type of instrumentation I had enabled on launch above each of the screenshots.

It seems like if there's any instrumentation features enabled at launch it has issues. Launching it from Reshape with Custom works, but as soon as I try to instrument the main pass with any sort of instrumentation, it runs into corruption issues:

For the wgpu shadow example, export stability says it's exporting Inf and Nan for return vec4<f32>(color, 1.0) * u_entity.color; For bevy's 3d_scene example it says Uninitialized resource read.

Note the corruption issues also happen if I connect to a standalone app and attempt to use any (or all) of the instrumentation.

For the bevy examples it also doesn't detect any pipeline names if I launch from Reshape (they're all unknown), but does if I connect post-launch.

For the wgpu example it's connecting to as if it's d3d12 for some reason when I launch it targeting vulkan (probably a wgpu bug), and it doesn't detect any pipelines.

The wgpu weirdness:

image

All this testing has been done on a 6800xt with the 24.2.1 driver on Windows 11 23H2.

miguel-petersen commented 3 months ago

Ah, apologies, I missed the text above the screenshots.

I plugged in a 6800xt, and am able to reproduce said corruption in the shadow sample: image

For the bevy examples it also doesn't detect any pipeline names if I launch from Reshape (they're all unknown), but does if I connect post-launch.

Smells like a synchronization issue on my part.

For the wgpu example it's connecting to as if it's d3d12 for some reason when I launch it targeting vulkan (probably a wgpu bug), and it doesn't detect any pipelines.

I've had all sorts of interesting cases like this, particularly when applications use the D3D API for capability querying, typically with DXGI. Perhaps WGPU does something similar?

Additionally, drivers themselves may internally create D3D devices, even if you're using Vulkan. Though, this should be guarded against in Reshape, unless I missed something.

One recent feature that landed may interest you, you can now attach all devices of a process (and its subprocesses). Workspaces will be created whenever the devices are created, though they don't automatically open the workspace view, just need to double click them. It's still experimental, so ignore the incorrect tooltips. 🙂

image

miguel-petersen commented 3 months ago

Ok, narrowed down the the issue to some OpVariables getting their initializers removed.

image

I have a couple of fixes going, once tested across a couple of Vulkan games I'll get it in, likely a bit later today.

miguel-petersen commented 3 months ago

Hi @Elabajaba, the submit above should address that issue. Latest does contain a couple more bug fixes for the WGPU samples.

miguel-petersen commented 3 months ago

Hi, these issues mentioned above should be fixed now, so I'm closing this issue. Feel free to reopen this issue if this is not the case.