Closed MortimerGoro closed 3 months ago
@fernandojsg @takahirox @jdashg are there any strategic plans for our webgl support that might help address the issue of fragment shaders maxxing out the gpu bus?
The main issue here is probably "Layers' framerate is capped at the framerate of webgl frames": https://bugzilla.mozilla.org/show_bug.cgi?id=1565360
There's not a lot we can really do to differentiate legit vs malicous high-demand content. We throttle timers/raf for background tabs already.
Trying to implement quotas for various GPU resources just isn't really viable.
Decoupling UI framerate from WebGL framerate will do a lot to make this problem better though. Unfortunately, it's tricky to implement I think.
In FxR the UI is decoupled from the Gecko compositor. So even if the compositor is running at 30Hz, The FxR UI will still run at 72Hz. FxR has it's own render loop that is a separate thread from the Gecko main thread. We also have a UI thread (which is standard Android). From testing this issue really seems to be some limit being exceeded that puts the GPU in a bad state.
I don't know if there's much that can be done. The only thing we can do is throttle but we can't tell when it's good or bad content.
Would it be possible to decrease the canvas resolution? Perhaps even capping it?
That may happen to help this case but not the general case.
We can't really throttle the output size of the app without cooperation from the app, which if we had, would not be a problem here. :)
I expect apps to respond poorly if we change their backbuffer size without an app-initiated resize.
Even it it keeps the same aspect ratio? I realize there is no good solution. I'm just trying to find any solutions that might help mitigate the problem.
Some content will handle it but I don't expect most will. It would have to be written with the idea that the resulting size might not match the requested size, and I think most content is not written with that in mind. Maybe it's ok to break them, I don't know.
Generally speaking running heavy GPU workloads will do harsh things to the whole system, just like running heavy CPU workloads. This problem is more or less one as old as computers. It's just a new way to encounter an old problem. I think this is more or less a Harsh Reality.
I think in particular thinking of this as "can we reduce the number of fragments" is only looking for a band-aid for this particular piece of content, but it won't work for other differently-heavy workloads. As such, I don't see resizing as a good general approach.
So, one last thought. Since jank in VR has a huge impact on user comfort, maybe we can detect when the VR render loop starts janking and pause the session? And also show a pop up saying the content is not optimized for the hardware or something?
You might just choose to lose the webgl context in that case.
We can pause the compositor for a given session from the render loop, so that might be the easiest thing to try first.
Once we go multi window we will probably need a GeckoView API to know if WebGL is running in a session. We might be able to add an API to kill the GL context at the same time.
For pausing, that should just be jank-based without respect to WebGL, it's just WebGL's the likely reason. (But eventually it might be WebGPU too, and historically it could have been Canvas2D-on-SkiaGL, though not anymore)
@kearwood Do we have a convenient way to detect janking in the VR render loop and to signal that e.g. to our Android layer?
Sure, we want to know when a session is doing GPU related operations, WebRender might add an unanticipated wrinkle. We should probably test that. But for the most part, our render thread has CPU priority so while still possible it is less likely for FxR to jank from session CPU usage.
I expect apps to respond poorly if we change their backbuffer size without an app-initiated resize.
An alternative solution would be to reduce the Gecko window size. I think apps that relate canvas size to the windows size usually handle the resize event. We could even reduce the window and scale it in the FxR quad, so the window will have the same size for a user but with a worse quality.
This may break CSS element sizes though (e.g. Enter VR button)
@kearwood Do we have a convenient way to detect janking in the VR render loop and to signal that e.g. to our Android layer?
@philip-lamb This problems happens before entering WebVR, just showing webgl scene on a window, so we also need to detect janking outside of the VR render loop
For now, we have exposed a user-facing performance monitor which detects janking and offers a plausible explanation and user option: #1401
STR:
Oculus Browser has the same problem on both Oculus Go and Quest. It seems we are hitting a max GPU bus threshold and that affects the Timewarp