webgl fallback fails in some browsers.

rukai commented 2 months ago

Description My understanding is that when the webgpu and webgl cargo features are enabled and Backends::all() is set, the adapter should attempt to be created with webgpu and then failing that fallback to webgl. However that doesnt seem to be happening in chrome on my machine (linux) and chrome on a users machine (windows 10) It does however seem to work fine on firefox on my machine.

This issue is not present in the wgpu examples, but I cant figure out what is different between the example and my (isolated) code

Repro steps

git clone -b fail_to_create_adapter_on_chrome https://github.com/rukai/wgpu_bug_report
cd wgpu_bug_report
cargo run-wasm --package wgpu_foo

Expected vs observed behavior I expect the project to work on chrome on my machine, it does not.

Extra materials Fails on chrome: Succeeds on firefox:

BGR360 commented 2 months ago

I think this might be expected.

You say that:

My understanding is that when the webgpu and webgl cargo features are enabled and Backends::all() is set, the adapter should attempt to be created with webgpu and then failing that fallback to webgl.

But actually, the way Instance works is a bit more subtle. From the docs (emphasis my own):

instance_desc - Has fields for which backends wgpu will choose during instantiation, and which DX12 shader compiler wgpu will use.

Backends::BROWSER_WEBGPU takes a special role: If it is set and WebGPU support is detected, this instance will only be able to create WebGPU adapters. If you instead want to force use of WebGL, either disable the webgpu compile-time feature or don't add the Backends::BROWSER_WEBGPU flag to the the instance_desc’s backends field. If it is set and WebGPU support is not detected, the instance will use wgpu-core to create adapters. Meaning that if the webgl feature is enabled, it is able to create a WebGL adapter.

So Instance::new will only fall back to webgl if webgpu is not detected in the browser at all. In your case, it seems Chrome advertises support for webgpu, so your wgpu instance is only going to attempt.

One way around this, inspired by eframe, is to re-create the instance and only request Backends::GL in the instance_desc.

https://github.com/emilk/egui/blob/454abf705b87aba70cef582d6ce80f74aa398906/crates/eframe/src/web/web_painter_wgpu.rs#L117-L166

rukai commented 2 months ago

Could the webgpu detection in wgpu take on this extra case? Otherwise maybe include the snippet needed to work around the bug in the docs?

Wumpf commented 2 months ago

I wrote that awful long comment on eframe & the accompanying workaround, so I'm also very interested in finding a better solution, just haven't figured out yet how to do so elegantly :( The issue is that on native you really want to first create the surface and then the adapter and on web it's the other way round because of the way chrome behaves here. It might be worth profiling how long adapter creation takes and if its fast enough put that into the webgpu detection code. And if it isn't to make sure that once it's there it can be reused

BGR360 commented 2 months ago

It might be worth profiling how long adapter creation takes

Is adapter/surface/device creation really that performance-sensitive of an operation? Genuine question, it surprises me but I'd believe you if you told me it was true

Wumpf commented 2 months ago

I fear could put a significant cost on startup time. Whether that's performance sensitive depends on your application but it would be pretty awful if every wgpu application on the web were to take 100ms (made up, I believe it's bad but likely not that bad) extra to start without any way to opt out of it.

Something I forgot which makes this even harder: adapter creation requires handing back control to the browser (it's async!) which then would make instance creation async and I'd really like to avoid that.

rukai commented 2 months ago

I guess the way I see it is, if you want your application to be compatible with all browsers while using webgpu (both webgl and webgpu), you HAVE to take that 100ms (madeup) cost anyway, its just that currently you have to do it manually with some hack rather than having wgpu handle that for you.

If you dont want to take that cost and can avoid supporting all browsers, then just disable the webgl feature which would also disable that check.

So I think that the wgpu project needs to take one of these actions:

add the hack internally
document that enabling the webgpu feature means that your application will not run on all browsers. And document how to add the hack yourself
document that enabling the webgpu feature means that your application will not run on all browsers. Without any documentation on how to work around the issue.

Maybe one day this will issue will resolve itself if chrome can either support webgpu or not fail to support it in a broken way. But I think it is inappropriate to take no action in the meantime.

Wumpf commented 2 months ago

agreed, we have to bite the bullet here regardless. The only question is how sophisticated we need to be about creating the adapter only once on the happy path. Looking at that as a pure optimization what we really need in the instance creation is something very similar to what Babylon.js does here https://github.com/BabylonJS/Babylon.js/blob/128baa04d20bca9428baa6bec3b773cb4e239d75/packages/dev/core/src/Engines/webgpuEngine.ts#L469

A good first step would be to provide this as a free function on all webgpu enabled wgpu builds since it's a useful utility regardless. This opens a much easier way to deal with the problem in general.

Now, the tricky bit remains of how to handle this logic on wgpu::Instance::new: we'd have to make it asynchronous, so I'd really like to avoid that since it needlessly affects all users of wgpu whether they're interested in this or not. One way to go about this is to take a step back and make the different "kinds" of wgpu instances more explicit by phasing out today's new method in favor of a non-async factory method that always uses ContextWgpuCore and an async one that produces an instance with either ContextWgpuCore or ContextWebGpu 🤔

Feels very related to some of the wgpu context dispatching refactors @cwfitzgerald had in mind (is there a ticket about this?)

BGR360 commented 2 months ago

we'd have to make it asynchronous

Correct me if I'm missing something, but I think you could avoid this. It might be a little tricky to wire it all up, but I think you can make it so that Instance doesn't decide on WebGL or WebGpu until the first call to request_adapter. Here's my thinking:

The only other thing you can do with an Instance before requesting an adapter is call create_surface (and generate_report, but that already has some sort of caveat around WebGpu; I'm sure it wouldn't be a huge breaking change to adjust or add to that caveat).
The only you can do with a Surface before requesting an adapter is get_current_texture. I don't know what the expected behavior of that is when you call it before requesting an adapter, you'll have to tell me.
Instance and Surface already appear to share some state Arc<Context>.

So, you could make Context start out in some "undecided" state and change it on the first call to request_adapter. Any Surface given out before requesting the adapter would be given an Arc to the yet-undecided context. As far as I can tell, the only issue with that appears to be synchronizing the change. Perhaps that is not possible to do on the web target? Does wgpu guarantee Send and Sync on Instance and Surface?

Wumpf commented 1 month ago

I started looking a bit deeper into it but the surface creation story is as I feared really tricky. WebGL needs the surface to be created before adapter creation (it has to be passed as compatible_surface). Which leads to two problems:

webgl surface creation is destructive: Once getContext is called on a canvas, it can't be used with any other context. I.e. if you create an WebGL surface you can't use it anymore with WebGPU
- this is the easier one: we can delay getContext call inside HAL for a bit, so the canvas gets poisoned as late as possible. Drawback is ofc that we can't be sure that Surface creation actually suceeded
create_surface needs to return a surface that might be used either with wgpu-core or webgpu
- still looking into this one, this is a lot more annoying

otherwise I think the "undecided instance" approach can work, already started prototyping it. Not sure yet if I want to extend that to the surface or handle it differently 🤔. Pushed the few things I have as of writing to https://github.com/Wumpf/wgpu/tree/wip-multi-context-instance-and-surface get_current_texture doesn't worry me much because that doesn't need to work before one calls configure and that in turn means that there's a device already.

Does wgpu guarantee Send and Sync on Instance and Surface?

pretty much everything is send/sync on native and not at all on wasm unless fragile-send-sync-non-atomic-wasm is enabled which opts it in unless atomics are enabled - the background is that webgpu resources (and afaik webgl) can't be shared with webworkers unless they use a special javascript method so send objects [...]

Wumpf commented 1 month ago

Urgh, this is just too messy for something that can be solved realtively easy in user land if aforementioned utilities are provided. I'll instead look into that and document everything out + providing a sample rather than spending more energy on "undecided" surfaces & adapters.

If someone else wants to pick up that thread, I put the barely started sketch here now https://github.com/Wumpf/wgpu/tree/wip-multi-context-instance-and-surface

Recap of how stuff works today, highlighting why things are hard:

WebGL is implemented by wgpu-core. WebGPU is implemented directly by wgpu. Today, wgpu::Instance redirects to either (!) wgpu-core or WebGPU. These are called "context"s
surface creation is adapter/context independent today, happening on an wgpu::Instance
- the parameters passed in to surface creation may already strongly hint at a backend though, e.g. a core animation layer can only be used with Metal
upon adapter creation, wgpu today checks the compatible_surface parameter if a given surface is compatible with the adapter
specifically webgl requires the surface on compatible_surface in order to get the gl context from canvas.getContext (this is not the case for webgpu!)
both on WebGL & WebGPU calls to canvas.getContext are destructive: once one kind of context is acquired all subsequent calls will fail
both on WebGL & WebGPU canvas creation may return with a failure due to failed canvas.getContext, which is important because a user provided canvas may already have a gl/webgpu context (and delaying this error is challenging)

rukai commented 1 month ago

Thankyou so much for your work on this, I would like to find time to test it but have been very busy lately.

gfx-rs / wgpu

webgl fallback fails in some browsers. #6166