godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.29k stars 21.05k forks source link

Large Meshes render incorrectly (depth buffer issue) #32764

Closed npip99 closed 3 years ago

npip99 commented 5 years ago

Godot version: v3.2.alpha0.official

OS/device including version: Ubuntu 18.04 LTS

Issue description: If a mesh is exceptionally large, on the order of 100k-1M in each direction, and it is centered on the camera, then is it culled weirdly and will pop in and out of existence as you look around.

Steps to reproduce: Create a new PlaneMesh, with scale 1,000,000, centered at 0,0. Have the camera centered at 0,0, with far distance set to 90,000. When you run the scene, it'll be glitchy because the plane will be popping in and out of view.

(This was happening to me with the terrain, which understandably made the game unplayable). A fix is to have four meshes, each having a corner at 0,0, so that they're making up each quadrant of the map. So for some reason, the issue only occurs when you're at the center of the mesh, not when you're at the corner.

Minimal reproduction project: godotbug.zip

akien-mga commented 5 years ago

I can reproduce it on Linux with AMD Radeon RX Vega M, both with GLES2 and GLES3, on 3.1.1-stable and current master (3ca1296b8).

For me it seems to be quite deterministic (in this demo), there are two intervals of Y rotation angles for which the clipping happens.

lawnjelly commented 5 years ago

If you set your camera near clipping plane from 0.01 to 1, the problem doesn't seem to occur. I don't think it's the octree culling code that is misbehaving, more something due to lack of precision in camera space.

Edit, for more info have a look at this: https://developer.nvidia.com/content/depth-precision-visualized

Note that with standard z, precision is non-linear and very sensitive to the near plane setting.

There is also a reversed z mapping trick which can help with precision issues, I don't know if Godot has this available.

npip99 commented 5 years ago

It looks like lawnjelly has a solution if it's possible to implement reversed z. One thing I thought of is simply changing the minimum near distance. It's already bounded at 0.01 but if 0.01 causes bugs and 0.05 doesn't then the real minimum should be 0.05 until reversed z is implemented, unless reversed z is trivial to implement and you might as well fix that first (idk anything about this to make that call)

As a note, the default camera distance is currently 100 but if 0.05 is bugless at infinitely far depths I see no reason not to make the default far plane infinite and default nearplane the minimum one that works; this a common technique that keeps the game dev outside of the implementation details as much as possible, since it's not really "more expensive" to have the near and far planes any farther than the farthest possible distance. Unless you count frustum culling but tbh unexpected frustum culling in the far plane just means the game dev will have to update the far draw distance anyway since parts of the scene would be literally disappearing without any confirmation that the game dev wanted that.

Raphael2048 commented 5 years ago

In OpenGL, reversed-z need OpenGL4.5 later or ARB_clip_control support to use glClipControl, and GLES didn't support this. So it could be very complicated to achieve this.

lawnjelly commented 5 years ago

In OpenGL, reversed-z need OpenGL4.5 later or ARB_clip_control support to use glClipControl, and GLES didn't support this. So it could be very complicated to achieve this.

Yep I noticed this too in the linked doc. So would be confusing to support multi-platform if some supported this and others didn't.

@npip99 I think it is difficult to set defaults for near and far plane that will work for everyone, you usually have to set them according to your game. There are also other tricks for rendering things far into the distance, e.g. https://godotforums.org/discussion/21313/how-to-simulate-very-far-away-objects-camea-far-clipping-distance

npip99 commented 5 years ago

@lawnjelly I mean it's hard for me to discuss because I really don't know much about this, but I was referencing "An infinite far plane makes only a miniscule difference in error rates. Upchurch and Desbrun predicted a 25% reduction in absolute numerical error, but it doesn't seem to translate into a reduced rate of comparison errors.", which seems to imply that an infinite far plane is just as good. So it seems making the default infinite makes sense. But like again I don't really get the details because I'm not familiar with this topic. Reading that article helped a lot though, thanks, I fixed a lot of bugs I was having with my water and planes getting glitchy and I had no idea it was because of depth buffer issues. Maybe an explanation of this phenomenon should be in the gd docs? That should be enough to close this issue because if we can't avoid the bug by the nature of how z buffers work, then we'd have to explain it to new users.

lawnjelly commented 5 years ago

To be a little more specific about this particular issue, I can say what I think might be going on, but this is all conjecture, I haven't investigated in any detail, it is just a theory based on the behaviour, and I'm going out on a limb here. There may be an alternative much better explanation. :smile:

Generally the effect of near and far clipping planes is to determine how much z fighting goes on when triangles are rendered 'close together' in the z coordinate (in projected camera space, rather than world space), i.e. the resolution of the z buffer is the issue, and that is what the linked article is mostly concerned with.

In this particular case, rather than the resolution, I suspect that the issue may be due to the coordinates somewhere in the pipeline being way outside the expected range, and the nature of floating point calculations. I.e. the near and far plane don't just determine the final resolution, they determine the actual z values of vertices in clip and NDC space that are outside the camera range.

As I understand it the vertices in geometry in godot (and 3d apps in general) tend to go through one or more transformation matrices that get them into clip space just before they get rendered: https://learnopengl.com/Getting-started/Coordinate-Systems

I expect that this pipeline is made with the assumption that the triangles coming through the system with reasonable values. Once the vertices are being transformed way outside this range, the contract is broken and weird things might be happening. You would hope the system would deal with such numbers gracefully, but ultimately the pipeline will be built for speed rather than dealing with special cases.

In practice I doubt this occurs much except in such test cases, because even with a large terrain, you would typically tessellate it. You could add a special path in rendering code (or GPUs) to deal with such things, but I expect it would be ridiculous to do so because it would probably slow down all other rendering for a case which would hardly ever happen.

Skaruts commented 4 years ago

I think I'm having the same problem (or seemingly closely related), and I think @lawnjelly may be half-right. I say half-right because, at least in my case, this only happens in GLES2 (in GLES3 everything is just fine). However, as I bumped up z-near in small steps it did seem to make the problem happen further and further away from the camera, and at some point I couldn't tell it was there anymore, so... I guess that kinda "fixes" it.

This is what it looks like on my end (these meshes aren't that big, the plane is 200x200, and the cube is 20x20x20: jagged

Calinou commented 4 years ago

Maybe GLES2 uses a 16-bit depth buffer, which makes precision issues much more visible?

Also, note that on Intel IGPs, depth buffers will be 16-bit unless you explicitly request 24-bit precision (unlike AMD and NVIDIA GPUs, where the default is to use a 24-bit depth buffer).

Skaruts commented 4 years ago

@Calinou Hmm, well, the thing is, it seems to render fine under some circumstances. I was just now thinking of this: adding a SpatialMaterial to the cube and turning on one of these flags makes it render fine against the plane: Transparent, Unshaded or No-Depth-Test. Weirdly enough, with no materials, when I select the cube it suddenly renders fine.

(Just by the way, unfortunately I didn't think to give a different color to the cube before recording the gif, so it's not noticeable there that the cube actually pops in and out of existence at times, much like the OP mentioned happening to his meshes.)

Skaruts commented 4 years ago

Blue cube has transparent turned on, red cube doesn't. Note, the red cube position is near the origin. jagged1

lawnjelly commented 4 years ago

Although Z related I'm not sure it is the same. The red cube looks almost like it isn't z sorting at all (and the order of tris is semi random). Have you got a project file for this?

Skaruts commented 4 years ago

I thought it might be the same or related because when I zoom out objects also disappear and reappear sometimes, and your suggestions about the camera settings made a difference. But maybe it's not, indeed.

Here's the project from the gif: Jagged-Edges-Test.zip

If it's not related, perhaps I'll open an issue about it then.

npip99 commented 4 years ago

Wait I think in my project if you put some other physical object in view it also screwed with it, causing z sorting issues. I'm not sure though, maybe that didn't happen. It depends, does Skaruts's issue also have to do with large values and small near values? If not then I think its different.

lawnjelly commented 4 years ago

It probably needs a fresh issue, I'm not sure it is the same thing.

The project file looks very simple and I can't see any obvious problems. With a fresh issue template (with the hardware / OS etc) maybe we would spot something, maybe it's not able to create the requested frame buffer and there's no z buffer or something like that.

Also be sure to note in the issue what your editor camera settings are for near and far (View->Settings)

Skaruts commented 4 years ago

Camera settings were default ones. I don't remember ever changing them(z-near: 0.1, z-far: 500). Though I did change them for testing earlier (and I don't know if the settings go with the project file).

Anyway, alright then, I'll make another issue. Sorry about cluttering this one.

clayjohn commented 4 years ago

I was able to confirm as well. I don't think it is a depth buffer issue. To me it looks like a frustum culling issue. Possibly having to do with the octree culling implementation. This entire area is being rewritten for 4.0, so bumping back to then.

KoBeWi commented 3 years ago

Still valid in 3.2.4 beta3 I tried in 4.0 too, but the mesh doesn't seem to appear at all...

Flarkk commented 3 years ago

Here is a MRP I've tested using different versions of Godot 3 : Test_culling_issue.zip

The setup is simple :

image

The slight difference in cameras positions is chosen to make the clipping issue appear :

With Godot 3.1.2 :

With Godot 3.3 :

Note1 : this cameras placement iillustrates only one specific case of this clipping issue (which also happens in many others placements I experienced in a real project, with no obvious discernable pattern at first sight).

Note2 : the values involved here are far below the overflow limit of 32 bits float (which is about 1e38). Still, it's not excluded that floating point overflow / precision issues are involved in internal calculations.

Flarkk commented 3 years ago

For illustration here is the 'real world' project mentioned in the post above. Here the planet radius is about 9.0e6 units. Note how the patches are clipped in and out as the camera moves, in a pretty erratic way. This is only a matter of scale : I tested the same scene at a 'normal' scale (planet radius between 1.0 and 10.0) and everything goes smoothly) culling_issue

lawnjelly commented 3 years ago

This issue has become a little confusing because there are two problems which are absolutely expected to occur with large values:

The title of this issue is vague - the OP's issue was to do with the depth buffer if I remember correctly. Normally we try to address different things in different issues, so it is easier to follow. For this reason it might be an idea to rename this issue so it becomes easier to track in future.

Having behaviour break down at large values is not necessarily a bug per se, as that is expected. The only thing we may have some control over is the point at which it breaks down, so we can try and get consistent behaviour in as large a range as possible. So I'm trying to simplify your project and look at the AABBs in the debugger to find out what part breaks first...

Large world literature

As I'm sure you are probably aware, but there is an extensive literature on dealing with these problems, as they come up in most large world games. I'm no expert on this, but for the benefit of future readers, as this comes up quite a bit: https://www.google.com/search?q=rendering+large+worlds+float+error

They don't only affect rendering, they are also problematic for physics (and indeed anything that uses spatial math).

Some approaches that are often used:

I tried to write a small mention of this in the docs: https://docs.godotengine.org/en/stable/tutorials/optimization/optimizing_3d_performance.html#large-worlds

lawnjelly commented 3 years ago

@Flarkk In your project the culling is failing because Geometry::compute_convex_mesh_points is unable to compute the points of the camera frustum due to it being so large I presume. The routine uses a fixed epsilon (CMP_EPSILON), which might not be ideal with large worlds, but in this case it is failing to intersect the 3 planes of the frustum to get the frustum points at all.

I'll have a look whether we can use a routine using doubles and see if this helps. Even if this bit is fixed, it is possible there will be problems further along in the culling.

Ah no it seems like the planes coming into the routine are already garbage from CameraMatrix::get_projection_planes. I'll keep trying but it's highly likely that there is more than one area that fails with such high numbers.

Yeah this is taking too long and may be a wild goose chase, but that seems to be the point of failure in this case, the camera get_projection_planes, before it gets to the culling. In case I / someone else investigates further.

Flarkk commented 3 years ago

@lawnjelly thanks for these so detailed answers.

A few short comments :

The title of this issue is vague - the OP's issue was to do with the depth buffer if I remember correctly.

No, in my opinion the very original post from @npip99 is about a culling issue similar to mine. Indeed subsequent posts mix zbuffer issues up. (You might want to change the title again then :-p)

The only thing we may have some control over is the point at which it breaks down, so we can try and get consistent behaviour in as large a range as possible

True !

  • rendering far away objects in a separate pass (with different scale and depth buffer settings etc)

I do render far away objects in different passes (actually stacked viewports with transparent bg and camera with the right planes settings for each). But I’m trying not to rescale objects to avoid layer management stuff with my objects and lights (all of them enter the cull stage of all cameras, and finally happen to display on only one due to the culling process)

that seems to be the point of failure in this case, the camera get_projection_planes, before it gets to the culling. In case I / someone else investigates further.

Big thank you for this hint. I’ve tried to figure out where it broke by myself but to be honest I’m not very familiar with the engine’s code. I’ll follow the investigation up, but if you manage to do so as well, please do !

lawnjelly commented 3 years ago

No, in my opinion the very original post from @npip99 is about a culling issue similar to mine. Indeed subsequent posts mix zbuffer issues up. (You might want to change the title again then :-p)

I'm not 100% sure actually, as it was a while since I debugged it, and I didn't go over the project with a fine toothed comb. Changing the near plane fixed it, which at the time I thought shouldn't have had a large effect in the culling. When something is just on the zfar boundary it can act like this and just flick on and off.

However it is alternatively possible that this near clipping plane was causing some problem in the calculation of the frustum hull (given the above problem with get_projection_planes). If I get time I will look again, you could be right and the title needs to swap again hehe. I don't know at this stage though.

Flarkk commented 3 years ago

CameraMatrix::get_projection_planes looks like it does not have been modified since v3.1. However my MRP above shows that the issue isn’t occuring the same way on 3.1 and 3.3. Hence I suspect there is at least another break aside from (if confirmed) CameraMatrix::get_projection_planes … or the problem is elsewhere

Flarkk commented 3 years ago

For traceability :

I’ve identified that the culling issue is caused by a fancy way of transforming normals in the Plane version of transform3d::xform().

It involves the sum of a normal vector with a positional vector, which causes floating point precision issues when the positional vector is far away (the typical consequence is the transformed normals output as Vector3(0.0, 0.0, 0.0)): https://github.com/godotengine/godot/blob/3fc39954ec3473cc022af615c5eb8b1ba271e008/core/math/transform_3d.h#L135

This occurs on all 3.x and 4.x versions.

This function is used during culling stage with clipping planes as an input, causing erroneous cullings when clipping distances are large.

The issue is fixed as a side benefit of https://github.com/godotengine/godot/pull/50637 and https://github.com/godotengine/godot/pull/50551 which rewrite most of xform()

pouleyKetchoupp commented 3 years ago

Fixed in 4.0 (https://github.com/godotengine/godot/pull/51355) and 3.4 (https://github.com/godotengine/godot/pull/50637).