floooh / sokol

minimal cross-platform standalone C headers
https://floooh.github.io/sokol-html5
zlib License
7.12k stars 501 forks source link

The WebGL renderer does not produce the same output as the GLES renderer after updating to samplers. #996

Closed caiiiycuk closed 9 months ago

caiiiycuk commented 9 months ago

Hi, I am a member of a team that is porting a D3D renderer of Perimeter to Sokol. We are doing this in our free time, so it has taken a while. We started before Sokol had samplers. A couple of days ago, we updated our Sokol to the latest version, and everything was smooth, except for WebGL.

The WebGL renderer does not produce the same output as the GLES renderer on Linux. This is a bit strange; we didn't change anything, just split the shaders to use samplers. We have two objects that are rendered incorrectly in WebGL: an energy shield (the blue one) and chaos (fog outside of the terrain).

GLES3/WebGL without samplers: wo-sampler

WebGL with samplers: with-samplers

Same picture in FF, Chrome, none WebGL warnings.

The chaos is just black, and the shield should be blue and is actually visible but too transparent. The shader used to render these objects is also used for many other objects and seems to work fine. Unfortunately, we have no idea what could be going wrong, maybe some internal state of Sokol. It could certainly be an issue with our renderer code, but why then does it work fine on Linux with GLES and the only problem is with WebGL?

Anyway, maybe you have some ideas for tests.

I did a bisect, and the first bad commit that I was able to build is:

Here is the shader code (very simple):

Without samplers: shader.txt

With samplers: shader2.txt

Note: In shader2.txt, we use one sampler for two textures. I'm not sure if this is valid, but we also tried using two samplers for two textures with no differences.

floooh commented 9 months ago

Strange indeed. I'll set aside some time over the next few days to look at the problem.

Is there some way I can test the WebGL build myself? (e.g. is is hosted somewhere?)

Also, do you have the build running in debug mode, or is that too slow on the web? (just to make sure that there are no errors from the sokol-gfx validation layer).

caiiiycuk commented 9 months ago

Thank you for quick reply! I sent the link in twitter.

floooh commented 9 months ago

Btw, not sure if it's an option, but hooking in the sokol_gfx_imgui.h header would be really helpful in the long run to analyze such problems :) (maybe just in a special debug build because Dear ImGui adds quite a bit of code to WASM builds)

floooh commented 9 months ago

Btw in the latest demo link you provided (with the Imgui-Integration) I'm seeing the fog (in Chrome on macOS):

image

Did you update sokol_gfx.h since the time those textures showed up as black? As I wrote on Twitter, until end of Oct-2023 there was a bug in the GL backend which caused textures without mipmaps to render as black/transparent if they used a sampler with a .mipmap_filter != SG_FILTER_NONE. WebGL may be especially strict about this, while native GL implementations may be more forgiving there.

But this had been fixed in the meantime and textures without mipmaps should now render correctly with any mipmap filter (and in the old code before the image/sampler split I specifically checked for this case and replaced the filter with NONE, so it also wasn't an issue there).

How can I best check the energy shield btw? At the start of the tutorial I'm not seeing them.

floooh commented 9 months ago

PS: once we figured out that texturing problem we should do something about the many create/destroy buffer calls per frame, and that each sg_draw() is accompanied by a viewport, scissor-rect, pipeline and bindings-update (is possible) :)

floooh commented 9 months ago

Hmm, I also see the fog though on the non-imgui build with samplers (where it should be black).

floooh commented 9 months ago

PPS: if it turns out that it "works on my machine" (M1 Mac, Chrome), but not yours: can you give more details about your setup? (esp what OS and graphics card)?

Also, just on a hunch:

Can you try to set the sampler's .mipmap_filter to SG_FILTER_NONE and check if those problematic textures become visible? As I said, this shouldn't actually be the issue, but I just want to make sure.

caiiiycuk commented 9 months ago

Oh, that's true; it works! I didn't test it on different OSes, and so I thought it was just broken. But now, after more testing, we can say that it works everywhere except on Windows. I also tried setting .mipmap_filter to SG_FILTER_NONE - it didn't help.

изображение

Does not work (both chaos and shield):

Works fine:

Regarding shield:

Just start a Battle game instead of a campaign, then place the energy core and activate the shield. изображение

caiiiycuk commented 9 months ago

Also natively it works fine:

floooh commented 9 months ago

Ok, I have a Windows PC with RTX2070 and a Windows laptop with Intel GPU, should be close enough to reproduce.

It almost looks like a D3D11 limitation I'm not aware of yet (but even then it would be strange that the behaviour changed with the image/sampler split, and also that WebGL allows differing behaviour).

caiiiycuk commented 9 months ago

This is commit that introduced this bug in game https://github.com/KD-lab-Open-Source/Perimeter/commit/e06f09ca458322718dcf3f0554f5610aec5afa48

IonAgorria commented 9 months ago

Currently im trying to compile it with sokol enabled for D3D11, will report back once it works

floooh commented 9 months ago

One obvious difference I see is here when creating a sampler object:

    sampler_desc.min_filter = SG_FILTER_LINEAR;
    sampler_desc.mag_filter = SG_FILTER_LINEAR;
    sampler_desc.mipmap_filter = SG_FILTER_LINEAR;

...this is different from this old code:

imgdesc->min_filter = imgdesc->mag_filter = SG_FILTER_NEAREST;

...the equivalent would be:

    sampler_desc.min_filter = SG_FILTER_NEAREST;
    sampler_desc.mag_filter = SG_FILTER_NEAREST;
    sampler_desc.mipmap_filter = SG_FILTER_NONE;

Nearest vs Linear shouldn't result in the such a rendering artefact though, and @caiiiycuk already confirmend that changing the mipmap filter doesn't make a difference, and D3D11 doesn't have an actual 'none' mipmap filter mode anyway, only 'point' (aka nearest) or 'linear'.

PS: ah ok, there's also this other place that has been deleted (so some textures actually used a mipmap-filter):

        desc->min_filter = 1 < desc->num_mipmaps ? SG_FILTER_LINEAR_MIPMAP_LINEAR : SG_FILTER_LINEAR;
        desc->mag_filter = SG_FILTER_LINEAR;

PPS: this old TODO in deleted code was fixed by this commit from Oct-2023 (https://github.com/floooh/sokol/commit/2bbba599635ba8bb6e53609f55a80ed1219c3e6b), specifically this line: https://github.com/floooh/sokol/commit/2bbba599635ba8bb6e53609f55a80ed1219c3e6b#diff-2e1da1beda27f21ea55e93ee52191efb90c1a1794a65aa34dce1db773a65332bR7697

#ifdef SOKOL_GL
        //TODO check why mipmaps isn't working
        desc->num_mipmaps = 1;
#else

PPS: glancing over all the other changes I can't see anything that would stick out as wrong though.

floooh commented 9 months ago

...not much new from my side today except that I see the error on my Windows laptop too.

So we definitely have a case where WebGL behaves differently between platforms, which is very unusual.

Do you guys know what those two textures are which are affected, and what could be special about them? Things like "non-power-of-2" - although that shouldn't matter anymore on WebGL2. But there must be a reason why some textures have that problem while others work fine.

PS: FWIW, changing the texcube-sapp sample to use a 'weird texture' (e.g. non-POT 5x5 pixels size, no mipmaps, and with the same sampler settings as in the game) works just fine on native D3D11 (and also in the browser).

floooh commented 9 months ago

Btw, the Sprector.js webgl debugger might come in handy, for instance here I have the drawcall selected which renders the black background. On the right side you can inspect the entire WebGL2 state at the time of the drawcall. I haven't dived in detail into the state yet though.

The extension is here: https://chromewebstore.google.com/detail/spectorjs/denbgaamihkadbghdceggmchnflmhpmk

Here's what it looks like with the problematic draw call selected:

perimeter_spectre

floooh commented 9 months ago

Btw, when I look very closely on my Intel GPU laptop I actually see some moving "cloud structures" in the black area, so it's not completely black, just extremely dark... hmm.

caiiiycuk commented 9 months ago

Yeah, and shield also sometimes visible it just too much transparent. The chaos (fog) render is very simple, it just two cloud textures that moving by translation matrix. Maybe there is some browser bug with loading webp image (faced this once on iphone safari). Probably I can try to extract this into simple test case.

caiiiycuk commented 9 months ago

I mean tex coordinates changing each frame, vertexes are stable

IonAgorria commented 9 months ago

Native D3D11 build also works fine, seems to be Windows + WebGL specific with no relation to gpu

caiiiycuk commented 9 months ago

Luckily, when @IonAgorria added support for D3D11, he removed the glsl100 dialect from the list of supported dialects. After that, rendering started to work fine. There is probably a bug in sokol-tools-bin that generates code to load glsl100 shaders in a GLES3 environment.

___ _____

Here is a zip with shaders glsl330:glsl300es:metal_macos and glsl330:glsl100:glsl300es:metal_macos.

shaders.zip

floooh commented 9 months ago

Wait... so it was the glsl100 shaders? Good to know but unexpected. TBH I was under the impression that glsl100 shaders are fully supported in WebGL2, that's why I didn't remove support for those in sokol-shdc (yet, but I guess this is a good oppoortunity).

Glad the issue is resolved though :) Can the ticket be closed then?

caiiiycuk commented 9 months ago

Of course. Supporting GLSL100 seems unnecessary, but could there be an error in the Sokol tool? It appears there are two blocks defined with #if defined(SOKOL_GLES3), where one block uses GLSL100 code and the other uses GLSL300es. Is that correct?

floooh commented 9 months ago

Theoretically it makes sense from the standpoint that GLSL100 shaders should also work in GLES3/WebGL2 (which for the most part they do, but as you discovered apparently not always). I'm not sure if sokol-shdc does the right thing when both glsl100 and glsl300es output is created, but in any case it makes more sense to remove glsl100 output from sokol-shdc entirely.

PS: wrote a reminder ticket for sokol-shdc here: https://github.com/floooh/sokol-tools/issues/119