godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
89.26k stars 20.23k forks source link

Texture Repeat sometimes ignored on Compatibility renderer #79315

Closed LRFLEW closed 1 year ago

LRFLEW commented 1 year ago

Godot version

4.1.stable

System information

Windows 10 22H2 - GLES3

Issue description

On occasion, the Compatibility renderer seems to ignore the Repeat option on a material and act as if it is disabled even when its enabled. It doesn't seem entirely consistent, and only seems to trigger when a scene is being (re)loaded. I haven't narrowed down what the trigger for this issue is, though.

Steps to reproduce

This issue is pretty easy to trigger in the editor. Using a simple scene with just a Camera3D and MeshInstance3D, the latter using a QuadMesh. Add a texture as a material on the mesh, and on that material, set UV1 -> Scale to something above 1 (I used 4) and ensure Sampling -> Repeat is enabled. Save the scene, close it, and re-open it to trigger the bug.

https://github.com/godotengine/godot/assets/607715/5f3c7081-ced4-491c-860c-b1d5873ab0dd

Minimal reproduction project

RepTest.zip

I've also experienced this issue in-game, but I was unable to make a minimum reproduction of it. If you want to see that, here's the project I experienced the issue on (a slightly modified version of my GMTK Jam entry with the workaround removed). To reproduce the issue, click the Play button, pause with P or ESC, and click the Menu button. You know the bug has triggered when the platform below the car becomes a solid color (the corner of the mesh with the texture is off-screen).

https://github.com/godotengine/godot/assets/607715/71e94b2c-61dd-452d-9eea-7d0029e78906

Testing.zip

Calinou commented 1 year ago

@LRFLEW Which graphics card model are you using? Are drivers up-to-date?

LRFLEW commented 1 year ago

Sorry, I should have included that in my initial report.

I experienced this with my Nvidia GTX 1080 Ti, and I'm on driver 531.79

It looks like there's been a driver update I haven't installed yet. I'll try updating it and re-testing this later. I can also try both programs on some of my other computers to see if it happens on them as well.

LRFLEW commented 1 year ago

Ok, update. I tested on a couple of computers I have:

And they all exhibit this behavior. Considering this is happening for me across GPU vendors, I don't think it's a driver-specific problem.

LRFLEW commented 1 year ago

Ok, I've looked into this more, and it's really weird. I ran the editor through RenderDoc (with the caveat that I had to stub GLManager_Windows::_nvapi_disable_threaded_optimization() to prevent a crash), which showed the GL calls for each texture. Comparing the results when the bug occurs to when it doesn't, there doesn't appear to be a difference in the number of GL calls, and walking through the editor in a debugger indicates that it shouldn't be using different arguments. However, the snapshots that RenderDoc takes with the bug (using the RepTest project I shared here) don't make a lot of sense. It claims that the last calls to glTexParameteri set the filtering mode to GL_NEAREST, but that the texture's state is using GL_LINEAR_MIPMAP_LINEAR.

Based on a combination of what I see in RenderDoc and walking through the editor in a debugger, I think I have an idea of what's happening. TextureStorage::_texture_set_data sets the "default" filter and repeat modes, then calls glTexImage2D to set the pixel data. https://github.com/godotengine/godot/blob/a33b548092433dbeddc05003b3cbd3e0991107d8/drivers/gles3/storage/texture_storage.cpp#L1293-L1300 The problem is that calling glTexImage2D seems to affect the internal state of these modes sometimes. It's not clear when or why it's doing this. However, my testing indicated that it happens on different GPUs, different GPU venders, and even different operating systems, so it may not be as simple as just a driver bug. Maybe it's trying to replicate a bug from an early OpenGL driver for compatibility reasons? I wasn't able to find any information on this looking online, but I do remember having to use glTexParameteri after glTexImage2D in my own projects. Without some documentation saying exactly when and how these texture properties change, I feel like the only way to address this issue is to consider these texture parameters to be indeterminate after calling glTexImage2D. (While I haven't seen the filtering mode being affected by this, I can only assume it affects it as well given what I've seen)

With that in mind, I think there are two reasonable options for how to address this issue:

There's also another option that I feel like shouldn't be considered:

What do you think would be the best approach here? Also, if you can find any more information on this, I'd love to hear it, because it's bugging me that I'm having so much trouble understanding the cause of this issue.

EDIT: Figured I should include the captures I made in RenderDoc in case it helps: RenderDoc.zip

LRFLEW commented 1 year ago

Something about this issue wasn't sitting right with me, so I kept looking into it. I didn't like how the output of the RenderDoc capture didn't make sense given what must have been the OpenGL calls made, and I started suspecting that the OpenGL state machine is getting corrupted by some sort of invalid input somewhere else. I tested both of the proposed options in my last comment, and while the second option did seem to work around the issue, the first option simply didn't work at all. This seems to de-confirm my assumption that it's the call to glTexImage2D that's triggering this issue.

I'm going to try to run the engine through more OpenGL debugging tools (including using a debug context) to see if I can pinpoint what's happening, but if this is an undefined behavior issue, I may not have the experience needed to track this down on my own.

As an aside, while doing more research into this, I learned about the existence of Sampler Objects, which from what I can tell are part of both the OpenGL 3.3 and OpenGL ES 3.0 standards, so they should be available for the Compatibility renderer to use. The engine can make use of the same texture with multiple materials (especially with the default textures), which can each want to use their own filtering and repeat mode. I'm wondering if it would be better to use these sampler objects, with each sampler object associated with a material instead of a texture, and avoid setting these parameters on the texture directly. This wouldn't necessarily fix the underlying issue here if it is an undefined behavior bug, but it's something I noticed.

clayjohn commented 1 year ago

Set Texture::state_filter and Texture::state_repeat to an invalid value (eg. RS::CANVAS_ITEM_TEXTURE_FILTER_MAX and RS::CANVAS_ITEM_TEXTURE_REPEAT_MAX respectively) so that when bind_uniforms_generic() first encounters the texture, it always calls glTexParameteri.

If necessary, I would go with this option. It shouldn't be necessary, but if it is enough to solve the problem, we may want to go this route.

Something about this issue wasn't sitting right with me, so I kept looking into it. I didn't like how the output of the RenderDoc capture didn't make sense given what must have been the OpenGL calls made, and I started suspecting that the OpenGL state machine is getting corrupted by some sort of invalid input somewhere else. I tested both of the proposed options in my last comment, and while the second option did seem to work around the issue, the first option simply didn't work at all. This seems to de-confirm my assumption that it's the call to glTexImage2D that's triggering this issue.

I'm going to try to run the engine through more OpenGL debugging tools (including using a debug context) to see if I can pinpoint what's happening, but if this is an undefined behavior issue, I may not have the experience needed to track this down on my own.

I think renderdoc is a bit unreliable for getting such granular information. I haven't had great luck debugging texture state issues with it.

As an aside, while doing more research into this, I learned about the existence of Sampler Objects, which from what I can tell are part of both the OpenGL 3.3 and OpenGL ES 3.0 standards, so they should be available for the Compatibility renderer to use. The engine can make use of the same texture with multiple materials (especially with the default textures), which can each want to use their own filtering and repeat mode. I'm wondering if it would be better to use these sampler objects, with each sampler object associated with a material instead of a texture, and avoid setting these parameters on the texture directly. This wouldn't necessarily fix the underlying issue here if it is an undefined behavior bug, but it's something I noticed.

I think this is super interesting, but probably something for us to stay away from. We try to stick to very common and basic things in the 3.3/ES 3.0 specifications as many older devices have poor opengl 3 driver support. So using stuff that is less common/newer tends to carry the risk of having unidentified driver bugs.

LRFLEW commented 1 year ago

I figured it out!!! It was a lot simpler than I was making it.

I decided to simply add a breakpoint in Texture::gl_set_repeat, knowing it would trigger for a lot more than just what I'm looking at, and just track everything that's calling it and when. Doing this, I quickly found these calls to gl_set_filter and gl_set_repeat in TextureStorage::_clear_render_target: https://github.com/godotengine/godot/blob/0c2144da908a8223e188d27ed1d31d8248056c78/drivers/gles3/storage/texture_storage.cpp#L1946-L1947 Notably, this is calling the function using Filter Max and Repeat Max, which doesn't represent a specific filter or repeat mode, so I assume this isn't intended to actually change the filtering and repeat mode of an OpenGL texture. (These lines of code appear to have been added in cc89321 from #78620, so @clayjohn can probably comment on this.) Because these lines aren't really meant to update an OpenGL texture, it never calls glBindTexture to set the active texture. However, calling gl_set_filter and gl_set_repeat with any new mode (even *_MAX) calls glTexParameter*, which affects whatever the currently bound texture is. The defaults for unknown arguments are the same as RS::CANVAS_ITEM_TEXTURE_FILTER_NEAREST (though with a max LOD of 1000) and RS::CANVAS_ITEM_TEXTURE_REPEAT_DISABLED for gl_set_filter and gl_set_repeat respectively.

It seems like the direct solution, then, would be to update Texture::gl_set_filter and Texture::gl_set_repeat to not perform any OpenGL calls when called with an invalid mode. I'm making a PR for that now, which should finally resolve this issue.