QW-Group / ezquake-source

main ezQuake source code base
https://www.ezquake.com/
GNU General Public License v2.0
270 stars 123 forks source link

BUG: Surfaces incorrectly rendered on MacOS M1 (?) #562

Open meag opened 3 years ago

meag commented 3 years ago

ezQuake version: ezquake 3.6-dev-alpha7 7069~21539784d

OS/device including version: MacOS m1 mini, classic

Describe the bug Triangles missing in the middle of surfaces

To Reproduce

  1. Single player game > start
  2. do not change angle using mouse
  3. Walk forward off the steps onto the surface in front of the normal corridor
  4. surface has triangle missing

Expected behavior Surfaces to be rendered as normal

Screenshots ezquake000

Additional context Can reproduce with minimal opengl program attached gltest.zip. Can fix by rotating or by offsetting the x or z co-ordinates by tiny amounts (try keys 1/3/a/d when running program)

Rendering individual triangles or in the middle of a triangle strip doesn't make a difference, the triangle is culled. Have disabled face culling prior to rendering, no difference. Same commands on nvidia on windows renders a complete surface, as expected

blaps commented 3 years ago

m1 Can reproduce on MacBook Pro (13-inch, M1, 2020), macOS 11.4 (20F71).

dsvensson commented 1 month ago

Workaround merged via https://github.com/QW-Group/ezquake-source/pull/940 and duplicate issue in https://github.com/QW-Group/ezquake-source/issues/938 ... a bit crude workaround that will introduce some minor glitches, but fixes the cheating part of it at least. This seems to be a hardware quirk going all the way back to the iPad GPU. A better solution would be to change the tesselation to not produce degens, rather than just filtering them out.

dsvensson commented 1 month ago

Sounds like a good return task for the great @meag

meag commented 1 month ago

The great meag? Not sure I've met him, this old rusty meag is a bit confused though. Is collinear the right term for the problem triangles? The example in the test program has coordinates (480, 544, -64), (672, 752, -64), (480, 568, -64), which has an area and isn't collinear?.... Not disagreeing with your workaround etc, just not sure how that particular case can be improved.

dsvensson commented 1 month ago

It actually is, as it's not the triangle that gets removed that's ~zero sized.

dsvensson commented 1 month ago

A conspiracy theory might be that it's a low level or hardware off-by-one somehow. But it's mysterious that the casualty always has two vertices outside the view. Would be interesting to write a small test that uses the other ways of making a triangle and see if that works. It happens with plain Metal as well. And browser's use of ANGLE makes it work as seen on hub.quakeworld.nu

meag commented 1 month ago

Definitely agree that it's the two verts outside the view (especially on the near plane) that get culled, and from memory/description they had to have the same z/w coordinates once transformed, if I rotated view the triangle reappeared.

I'll dust off the M1 and have another look at it. Be good to get it tracked down as it completely stumped me at the time.

dsvensson commented 1 month ago

But the scene also has to have a collinear triangle in it for the bug to show, as filtering out those, makes the unrelated triangle not clip or project out of existence. Scaling by w also avoids the problem, but messes up texture coords. The ideal place to reproduce is enabling deathmatch and /kill on dm3 until spawning at RL, just pressing forward. Another dm3 spawn is after outdoors elevators where you would drop down through hatch in roof.

Here's in FTE with wireframe.

https://github.com/user-attachments/assets/1b6a373b-d4b9-4d10-9b05-7df2c028c7bf

By pruning collinear verts you get a slight sparkle dot north/east of the nail box between floor/wall with the current workaround.

Building on macOS is nowadays

./bootstrap.sh
cmake --preset macos-arm64
cmake --build build-macos-arm64 --config Debug
meag commented 1 month ago

Got you - can see the problem on DM3 using older client too. The zip file attached to initial bug report was example I was referring to - a single 3-vert non colinear triangle, that won't render on M1. First & last vertex are behind the near plane... have tried rotating order they are sent, no difference. Could probably simplify the example further, will look again tomorrow.

meag commented 1 month ago

This won't be a complete surprise but I'm no further forward with this - updated gltest.c program attached with the single triangle not rendering. Interesting that the OpenGL state must be in some odd state that your change clears, but not sure what to do on the single-triangle example to fix it.

macos-triangle-20240927.zip

dsvensson commented 1 month ago

Oh I'm blind... I didn't see your original standalone test. Add this before your verts and the triangle removal feature is yet again operational:

                { 100 + offset[0], 100 + offset[1], -64 + offset[2] },
                { 200 + offset[3], 200 + offset[4], -64 + offset[5] },
                { 300 + offset[6], 300 + offset[7], -64 + offset[8] },

Worth noting is that it has to be before which is why I was thinking about it being some hardware or low level off-by-one, especially as there are indications on the interwebs that it's reproducible on iPad. Should probably try a few other scenarios. Like if it's always the triangle after that gets wasted.

So a proper workaround would If a polygon when tesselated results in degens, bruteforce a different tesselation order to produce a different set of triangles and see if any other start-vertex avoid producing degens, with a final fallback to inserting a vertex in the middle of the polygon, and tesselating around that, which ought to be bulletproof, and as bruteforcing sucks, perhaps just tesselate around the centroid from start.

But it's very bizarre that it's not reproducible in FTE when running in the browser on WebGL/ANGLE, that ought to yield the same result unless it has some geo scrubbing in the pipeline going for it.

dsvensson commented 1 month ago

Whoops, misspoke, that's not a strip :)

dsvensson commented 1 month ago

But simply duplicating your first vertex is enough for the rest of the strip to not render, or I'm not thinking correctly?

meag commented 1 month ago

You can change the primitive to GL_TRIANGLES, it won't matter... the triangle initially renders but if you move forward, at the point 2 verts are behind the camera position it will suddenly cull the triangle.

This particular case still looks like some kind of driver bug... it's really interesting that your code change improves things, it's almost like the default state (where there's no previous verts to work with) for this to be an issue, and a degenerate triangle would put it back into that state?

dsvensson commented 1 month ago

I just got a reply from Alyssa Rosenzweig, the developer who implements the agx/asahi driver in mesa. I asked to try your sample program out in Linux on an arm64 macbook and it's the same triangle disappearance there, so it's at least a hardware/firmware bug or quirk. The software rasterizer in mesa draws it correctly. Very bizarre that it goes away by pruning the collinear triangles in the engine with full map data, because it does, I haven't found a single spot where I can reproduce it any longer.