Browser WebGPU? - Githubissues

mighdoll commented 10 months ago

Is support for WebGPU development on your radar?

miguel-petersen commented 10 months ago

Hi Mighdoll.

At this moment WebGPU is not being considered for support.

miguel-petersen commented 10 months ago

To clarify, we are currently not seeking official support.

However, if WebGPU translates down to either DX12 or Vulkan (apologies, I am not too familiar), then GPU Reshape is able to hook into it.

mighdoll commented 10 months ago

Yep, I think there'd be hope that GPU-Reshape could hook in when running a browser on the the right platform. In WebGPU land, we're eager for GPU tool support!

Here's a bit about chrome: https://chromium.googlesource.com/chromium/src/+/main/docs/security/research/graphics/webgpu_technical_report.md and firefox: https://github.com/gfx-rs/wgpu

miguel-petersen commented 10 months ago

I took a quick look, and I am able to "attach" if I launch from Reshape, though no other chrome instances may be running before.

Specifically, I launch from Reshape with (provided by an AMD fellow!): --disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming

However, something is broken in the rendering. I don't think it's actually presenting, particularly also because during presents some data is sent back to the app, and I get nothing.

miguel-petersen commented 10 months ago

One thing about Chrome is that sub-processes are spawned with process mitigation policies. https://github.com/chromium/chromium/blob/b119cd4f3bf59a6b58553420741713a88b5325eb/sandbox/win/src/process_mitigations.cc#L460

Which I check against here. https://github.com/GPUOpen-Tools/GPU-Reshape/blob/main/Source/Backends/DX12/Bootstrapper/Source/DLL.cpp#L795

Chrome's sandboxing enables it. If a process enables either mitigation policy, I cannot inject my bootstrapper. I wonder how PIX (I heard it can do it?) handles it, I could tamper with the creation parameters, but I'm not sure if that's the right way forward, and could be seen as malicious, maybe.

Chrome does have a --no-sandbox parameter which avoids it, but then Reshape fails to discover any device on newly spawned tabs (new processes). Strange lands.

miguel-petersen commented 10 months ago

As a repro case, this is what I'm doing.

Reshape launch parameters: App: C:\Program Files (x86)\Google\Chrome\Application\chrome.exe Cwd: C:\Program Files (x86)\Google\Chrome\Application Args: --disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming --no-sandbox https://webgpu.github.io/webgpu-samples/samples/helloTriangle

It does launch, and connect to a device, but something breaks somewhere. This'll be an interesting one to debug!

miguel-petersen commented 10 months ago

From a brief investigation it appears that a command list is failing to close, likely indicating a validation error somewhere.

It'd be nice if it's a quick fix.

mighdoll commented 10 months ago

I posted a ref to this bug over on: https://matrix.to/#/#webgpu-dawn:matrix.org. I recommend dropping in over there if you have questions about Chrome/Dawn!

miguel-petersen commented 10 months ago

I joined the room 🙂

Managed to get this validation error out of Dawn with Reshape, probably what's causing the command list to fault. D3D12 ERROR: ID3D12GraphicsCommandList::CopyBufferRegion: Invalid Command List method (CopyBufferRegion) called within a Render Pass. [ EXECUTION ERROR #1203: RENDER_PASS_DISALLOWED_API_CALLED]

kainino0x commented 10 months ago

GPU work in chrome is all done in a single GPU subprocess. That process definitely needs sandboxing disabled to debug it. Here are instructions on how to get it working with PIX: https://gist.github.com/Popov72/41f71cbf8d55f2cb8cae93f439eee347 (The flags are the same as the ones you mentioned.)

That's likely the best option but if any problems are caused by launching multiple processes, there are other options: https://chromium.googlesource.com/chromium/src/+/main/docs/gpu/debugging_gpu_related_code.md#debugging-in-the-gpu-process

There is a flag called --gpu-launcher which prepends some arguments (i.e. a debugger or profiler) to the GPU process launch: https://peter.sh/experiments/chromium-command-line-switches/#gpu-launcher
And another one called --gpu-startup-dialog which pauses GPU process startup before it starts doing anything, so you can attach a debugger and continue: https://peter.sh/experiments/chromium-command-line-switches/#gpu-startup-dialog

Another option to incrementally investigate this would be to try debugging Dawn samples, though unfortunately you would have to build them from Dawn. If that error is coming from the D3D12 debug layer - both Chrome and Dawn should run cleanly against it, but Chrome has a lot more going on, so running just Dawn would narrow down where it's coming from. There are also ways to enable the D3D12 debug layer for Dawn (I think the flags are: when launching chrome, --enable-dawn-backend-validation; when launching dawn samples/tests, --enable-backend-validation; but tell me if those don't work)

miguel-petersen commented 10 months ago

As of d46430d it seems to render properly, however, does not send any data back yet.

Please note that the change is incomplete as I need to carefully manage the before / after access states of each render target and depth stencil, something I need a little more time to think about. Currently it just blindly reconstructs the render pass, which is incorrect.

miguel-petersen commented 10 months ago

@kainino0x Absolute pleasure to have a direct contributor here, thanks for the wealth of information!

Regarding D3D11On12, would you know how (Chrome) WebGPU utilizes it? Reshape does support hooking D3D11On12, however, it currently doesn't do much with it. Is it somehow involved with presentation?

Particularly on presentation, Reshape doesn't seem to hit hit any hooks, so I'm very much curious how that happens.

miguel-petersen commented 10 months ago

The issue is definitely regarding presentation. Currently Reshape sends data back during presentation, as I'm not hitting that hook nothing ever gets sent.

If I add a dummy thread to pump out data manually, I can get Reshape communicating. This is a change I've been meaning to do anyway, so I'll track it here.

Instrumentation seems to do its job as well. Though I am having troubles getting debug sources working. 🤔

kainino0x commented 10 months ago

I am pretty sure we are not using 11on12, but instead doing interop between native D3D11 (Chrome) and D3D12 (Dawn) but I don't know how that interop works.

If that's a problem, then I can check if the Dawn-D3D12 backend for Chrome compositing is working and how to switch it on. Then I think everything is supposed to go through D3D12.

kainino0x commented 10 months ago

Detecting frame boundaries has historically always been a problem with using Chrome with graphics debuggers, because Chrome's presentation is so complex. There might also be some option that injects a fake "swap" to tell debuggers where the frame boundaries are.

miguel-petersen commented 10 months ago

I see, while supporting D3D11 is not on the roadmap, I would consider hooking "just enough" to be able to detect frame boundaries. Useful for a many reasons.

If there's a switch to turn on native compositing, that would be a great way forward for the short term. 🙂

kainino0x commented 10 months ago

Chrome has a flag --use-angle=d3d11on12 which will use 11on12 for ANGLE. By default everything in Chrome should be going through either ANGLE or Dawn so theoretically that should make it use 12 exclusively.

If you have a chance let me know if that works, or maybe I can find a chance to try it myself and play with the flags. (Note: please use Chrome Canary, as I don't know the state of things in the Chrome release branches)

If that's a problem, then I can check if the Dawn-D3D12 backend for Chrome compositing is working and how to switch it on. Then I think everything is supposed to go through D3D12.

Turns out this is "very experimental" right now. I tried it and WebGL and WebGPU content didn't work at all. So not yet.

miguel-petersen commented 9 months ago

I've been testing on Chrome release, I'll see if I can't use Canary instead.

Turns out this is "very experimental" right now. I tried it and WebGL and WebGPU content didn't work at all. So not yet.

Gotcha. I'll see if I can't hook the presentation method somehow.

miguel-petersen commented 9 months ago

The canary branch seems to solve the symbol issue, which is great news.

miguel-petersen commented 9 months ago

Pretty happy with local performance. I've got one local change I need to think about, it's regarding how data is sent back to the app, just need to make sure I'm not introducing problems later on.

https://github.com/GPUOpen-Tools/GPU-Reshape/assets/7347572/d552f71e-3a03-450b-a0e8-a52b88c3ff53

miguel-petersen commented 8 months ago

Hi, quite a few changes have landed in development, and a couple more after GDC. Things should work much more smoothly now.

This includes lots of crash fixes and cases where Reshape was not preserving the original behaviour. And, most importantly, the pooling mechanism, which now happens on a controlled interval instead of during presentation. There's a number of benefits to this, but for chrome usage it removes the need to track presentation at all.

Another nice thing is that Reshape now supports hooking sub-processes and multiple devices, I find this super useful for chrome development. You can launch new tabs, reload examples, etc. and it "should just work". Just check the two checkboxes below.

Whenever the app / chrome creates a device, it'll appear in the list. To open its associated workspace, double click any of them.

Currently they don't auto delete, so the list might expand considerably as chrome's creating devices. Something to think about.

It'd be great if someone has a second to try it out, and see if they spot any issues with the current setup.

kainino0x commented 8 months ago

Very nice! I will ask the WebGPU matrix chat room if anyone wants to try this out.

kainino0x commented 8 months ago

Here are the Chromium flags again, for reference: --disable-gpu-sandbox --disable-direct-composition -enable-dawn-features=emit_hlsl_debug_symbols,disable_symbol_renaming

mighdoll commented 8 months ago

Awesome! I'll rebuild my windows machine to try gpu-reshape!

Hmm.. Need I buy an AMD card right away, or is my old 1080 nvidia card ok for now with GPU-reshape?

miguel-petersen commented 8 months ago

Hey @mighdoll! Reshape supports NVIDIA just fine, earliest model I tested was a GTX 970. That said, if you find anything let's fix it. 🙂

Also, just to reiterate, the relevant branch is now https://github.com/GPUOpen-Tools/GPU-Reshape/tree/development

mighdoll commented 8 months ago

Okay! I built the development branch and ran Reshape successfully on chrome canary with an nvidia 1080 card.

It's neato to see the generated hlsl and dxil for the shaders, and one of my old wgsl experiments generated three warnings (uninitialized resource read - twice, and texture read out of bounds) so I can see Reshape will quickly be useful.

A few things I noticed:

I can launch the browser with the suggested flags from within Reshape, then navigate to a page with a demo and it finds the shaders as it loads. Very nice!
Switching pages or reloading after Reshape has found some shaders doesn't seem to work, and usually reports lost connection. I thought it worked sometimes, but , so maybe it sometimes works? For example I tried switching between the examples on the webgpu-samples page.

Awesome work @miguel-petersen!

Let me know if you want me to collect any logging as I experiment.

miguel-petersen commented 8 months ago

Hey! Glad to hear things ran on your end. 🙂

On the switching of pages / reloading, chances are that the underlying device is destroyed at that point. It's up to chrome when devices are recreated, I guess it sometimes shares it, and sometimes not? Currently the lifetime of all the internal data, and an internal server, is tied to the underlying device. There's an interesting question here, if it should persist beyond that, maybe if Reshape (app side) is connected.

With "Attach All Devices" the new device should appear automatically in the workspace tree, already hooked.

If you come across any false positives, or general issues, feel free drop them here! Happy to fix them.

kainino0x commented 7 months ago

uninitialized resource read - twice

I don't think this should happen - Dawn is supposed to make sure all resources are initialized before they can be read, for security reasons. If those look like true-positives could you please file a Dawn bug (https://crbug.com/dawn) about them?

miguel-petersen commented 7 months ago

Would it be possible to know how Dawn initializes resources @kainino0x ? It's likely I'm just missing to hook a path.

kainino0x commented 7 months ago

@austinEng would know better

austinEng commented 7 months ago

There are a few paths:

render pass with LoadOp::Clear and LoadOp::Store
Using the builtin "clear" command e.g. ID3D12GraphicsCommandList::ClearRenderTargetView / ID3D12GraphicsCommandList::ClearDepthStencilView / vkCmdClearDepthStencilImage / vkCmdFillBufer / MTLBlitCommandEncoder::fillBuffer
buffer-to-buffer / buffer-to-texture copy from a buffer filled with 0s

miguel-petersen commented 7 months ago

Thanks Austin.

All of those paths should be hooked, so I wonder what's happening. Given that Dawn initializes all resources by default, I'll see if I can reproduce it in a public sample.

miguel-petersen commented 7 months ago

Got some time to see what was happening.

After a little investigation it seems that most samples go through a custom render pass which manually copies the texels, optionally with some color transformation.

What took me some time to understand is why Reshape didn't catch the initialization event of the source resources, until I saw that you are using OpenSharedHandle for some objects @austinEng. Tracking initialization events across, potentially, processes, is beyond the scope of Reshape. With that, I opted to mark all resources created from external handles as initialized from creation.

It'll get submitted to a branch that's not ready quite yet, but likely later in May. The initialization feature is getting reworked to track initialization states on a per-texel basis, instead of the whole resource. Same for concurrency.

GPUOpen-Tools / GPU-Reshape

Browser WebGPU? #46