godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.12k stars 69 forks source link

Add support for reverse-z depth buffer #3539

Closed roalyr closed 5 months ago

roalyr commented 2 years ago

Reverse-z buffer solution proposed

https://github.com/Khasehemwy/godot/commit/0a1554078ada94b7b10e5dcff681090fe1ad421d

Related issues

Issues related to extended z-far that I have encountered: https://github.com/godotengine/godot/issues/86275 https://github.com/roalyr/godot-for-3d-open-worlds/issues/15

Describe the project you are working on

I am working on a space game GDTLancer, which is designed to have large-scale objects at large distances (of the same magnitude as camera far plane distance maximum value) which can be reached and interacted with, such as stellar bodies, megastructures and other constructions that will be visible from far away, which means that camera far plane is set to be at a great distance (Up to 1e15-1e18 units, in my case).

Describe the problem or limitation you are having in your project

Large-scale spaces and rendering of thereof are limited by floating-point precision issues. While such things as jitter could be tackled by floating origin, things such as z-fighting for faraway objects require a solution that addresses the way things are rendered. Since the above-mentioned objects are meant to be interacted with, a double-viewport approach is not viable.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Initial solution was to implement logarithmic depth buffer since in GLES backend we had to deal with -1, +1 range. After 4.x came out we have moved onto Vulkan backend, wherein a better solution can be implemented - a reverse-z depth buffer: https://developer.nvidia.com/content/depth-precision-visualized

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

A workaround solution for logarithmic depth buffer that was used before (breaks optimization, depth-related effects and shadows A gdshader solution for lagarithmic depth buffer is as follows: ``` // Add this before your vertex shader. // Edit "Fcoef" to adjust for desirable view distance. Lesser number means further distance limit. uniform float Fcoef = 0.001; varying float gl_Position_z; // Add this to your vertex shader. void vertex() { vec4 gl_Position = MODELVIEW_MATRIX*vec4(VERTEX, 1.0); gl_Position_z = gl_Position.z; } //Add this to your fragment shader. void fragment() { DEPTH = log2(max(1e-6, 1.0 -gl_Position_z)) * Fcoef; } ```

If this enhancement will not be used often, can it be worked around with a few lines of script?

There is no way to work around this issue in a simple way, without compromising depth-related effects and shadow casting.

Is there a reason why this should be core and not an add-on in the asset library?

This is a feature that was already requested and will be requested again in future, because it allows to implement large open-world games in Godot. Reverse z depth buffer is beneficial for all kinds of 3D projects and comes at no drawbacks for performance.

roalyr commented 2 years ago

This is a solution that makes it work so long as you tweak every single spatial material in the game that depends on depth buffer. Works for GLES3, doesn't work for GLES2 (no gl_FragDepth exposed in GLES2). Short video demonstration of logarithmic depth and absence of z-fighting (ignore clipping in the end, the meshes shown in the video are flat ~100 units in thickness and 10000 units in size).

// For logarithmic depth buffer.
const float c = 0.001; // Tested at 500 M units.
varying vec4 gl_Position;

void vertex()
{
    // For logarithmic depth buffer.
    gl_Position = MODELVIEW_MATRIX*vec4(VERTEX, 1.0);
}

void fragment()
{
    // Logarithmic depth buffer.
    DEPTH = log2(max(1e-6, 1.0 -gl_Position.z)) * c;

    // Remaining frag code below ...

Depth draw can be anything, depth testing must be enabled.

It works, but it breaks shaders that rely on reading and decoding depth into linear value (see .glsl files in source code that have depth referenced. Shaders below should be inspected for misbehavior: subsurf_scattering, ssao_minify, ssao_blur, ssr, effects_blur, cube_to_dp. It also breaks distance fade in spatial material, and the algorithm should be reviewed.

Decoding

Possible decoding formula for above-mentioned shaders could be: linear_depth = pow(2,(depth_log + 1.0)/c) - 1.0; Instead of: linear_depth = 2.0 * z_near * z_far / (z_far + z_near -depth * (z_far - z_near)); But I didn't test it and it has to be checked (signs and 1.0s in the formula might fit DirectX depth instead of OpenGL, or something like that).

A possible Godot 3.x, 4.x implementation

As I see it, each shader and source code file should be tweaked with condition (an IF statement for some setting flag) that switches to either normal or logarithmic depth buffer in every instance that requires it. It requires gl_FragDepth to be exposed and written, so it can be implemented for GLES3 driver quite... trivially (as long as one is careful to catch all instances and knows how to implement switches properly that can be linked to a project options entry). Not sure about Vulkan, but if it makes use of .glsl files then it should be doable there too?

roalyr commented 2 years ago

It may also be reasonable to allow greater camera_far maximim value (say, 10M? 100M?) in such a case, depending on what size depth texture is. In some of the links above it was mentioned that 24 bit is enough for planetary scale logarithmic buffer, and 32 - for cosmic-scale (I think).

Alas it is hard for me to judge how many things depend on camera_far and whether that will break anything else.

Why? Well, that way you can start making real space simulators like Orbiter, with everything truly up to scale.

It is but a fanciful idea, but maybe it could be a subject of a test, by, say, making a sub-option when enabling logarithmic depth globally, to adjust maximum camera_far value cap.

Zylann commented 2 years ago

a double-viewport approach is not viable.

I have a similar project with large scales and interactive planets, and this is something I tried. It actually worked without too much effort, interaction was not a problem. One transparent viewport renders close stuff without background, while the other renders the rest and background. Both were then composed into one. This did not require duplicating the whole world, I only had to assign the same world to both viewports. Only one viewport actually had the nodes and stuff in it. (similar to how split-screen is achieved) The issues I encountered with this approach were mainly about depth-based post-processing (atmospheres), because some information gets lost in the final compositor (depth buffers) so each viewport needed to apply them doubling their cost, and there was a few issues with the "frontier" between the two clips, where a seam would sometimes become visible with MSAA, fog or transparent stuff. I also did not measure how performance was but I assume it was lower. I turned off this technique for now.

Either way, I'm still interested in having an easy way to render far away stuff because I still have Z-precision issues when rendering this stuff.

roalyr commented 2 years ago

Unfortunately, double-viewport is not very viable on low-performance hardware.

roalyr commented 2 years ago

I have re-compiled godot with 100M units far plane limit, and all seems to work fine. yes, one may want to implement some script-based culling but it is working well.

Alas, light processor must be adjusted for extreme Far-Near ranges due to https://github.com/godotengine/godot/issues/55070#issuecomment-976630823 (see workaround demo)

roalyr commented 2 years ago

In the shader code here, I have changed this:

const float c = 0.001;

This way it will work with any reasonable camera far value (tested with 500 M units).

reduz commented 2 years ago

Some notes about this:

Zylann commented 2 years ago

Even if you did this, if you have objects very far away they will run into floating point precision errors as-is. We would need to add better support for double precision in the vertex shader all around.

I thought if objects were very far away they would actually not be bothered by imprecision with small values because they are far? (only Z-fighting of close surfaces on the far object is a problem for me, but it can be solved easily with a LOD merging these faces into one)

roalyr commented 2 years ago

Vertex depth suffers from lack of proper interplation which causes artifacts on large faces. So far, fragment shader implementation seems to be the most reliable. Tested at least withing 0.01 - 1e15 units (roughly, couldn't test any more because shit happened).

As far as I know there's no reliable way to do proper fast vertex depth, and sources pointed out that possible soultion might involve adaptive tesslations, which, in fact, may not be all that good.

Single precision fragment shader depth does it job quite well, I couldn't test and see where double precision might come to be necessary as of yet.

roalyr commented 2 years ago

Here is my latest implementation (3.x): https://github.com/roalyr/godot-for-3d-open-worlds

roalyr commented 2 years ago

Will drop it here in case anyone can assist: https://github.com/roalyr/godot-for-3d-open-worlds/issues/6

KeyboardDanni commented 1 year ago

Why not simply reverse the Z? It should avoid the drawbacks associated with logarithmic Z while still improving Z-fighting considerably: https://developer.nvidia.com/content/depth-precision-visualized

I removed the log2 from the fragment code above and ended up with the following:

DEPTH = (max(1e-6, 1.0 -gl_Position.z)) * c;

This seems to work well for what I need. I'm not making a space game, but I still see Z-fighting from less than 300m away without this method. As much as I want to use this, anything with the default shader will appear in the back, and I can't find a way to change said default for when an object doesn't have a material applied.

Calinou commented 1 year ago

I suppose reverse Z may be problematic for games that have models close to the viewer (such as first-person weapon models not rendered in a separate viewport).

KeyboardDanni commented 1 year ago

I'd think that for most cases, there's already more than enough Z buffer precision for objects close to the camera with regular Z that reversing Z wouldn't introduce significant issues.

It should probably be an option though, especially since it would likely break existing custom shaders.

KeyboardDanni commented 1 year ago

So I decided to try this out and it looks like there is indeed some precision loss for close objects. Here is a surface that is placed 0.001m above the other, and rotated 0.1 degrees. Curious to see if increasing depth buffer precision can counteract this.

With normal Z: NormalZClosePrecision

With reversed Z: ReverseZClosePrecision

KeyboardDanni commented 1 year ago

Another resource on reversed Z: https://thxforthefish.com/posts/reverse_z/

A couple precision-related questions I have:

roalyr commented 1 year ago

I have worked around the FPS drop in case of Log depth by making some shaders use vertex lighting. I've looked into reversed Z, but the implementation seems rather tricky, and I am not sure it will be enough for my needs.

Maybe there is a better way to implement and optimize it all exists for Vulkan since it has depth range of 0.0 - 1.0, and maybe with reversed-z and more precision it could be a good compromise solution.

But Log depth is still a very valid option, despite its drawbacks. If possible, it would be very handy to have depth texture encoding variants as project rendering option, but that would require having proper decoders for all the FX shaders and shadow casting.

Zylann commented 1 year ago

Is it alternatively possible to have 32-bit depth buffer, instead of 24-bit+stencil packed into the same buffer? My game doesn't need billions of kilometers, but is still large enough so that distant objects are flickering a lot, so a bit more precision (not too much) would be welcome. I've noticed 32-bit depth was an option in the enums of the renderer, but I haven't figured how this can be enabled. I tried a bold attempt which resulted in a black screen (likely wasnt doing it properly).

Calinou commented 1 year ago

Is it alternatively possible to have 32-bit depth buffer, instead of 24-bit+stencil packed into the same buffer?

Last time I checked, 32-bit depth buffer is poorly supported on some GPUs. I'd check integrated graphics and mobile especially, which tend to default to a 16-bit depth buffer unless forced to use a 24-bit depth buffer.

Zylann commented 1 year ago

Well I would not make such a game for mobile or low-end devices anyways (large scale isnt the only reason), so there is that

KeyboardDanni commented 1 year ago

I think serious consideration should be made toward having an option to prefer a 32-bit depth buffer, at the very least.

Regarding hardware support:

In fact, I can't think of a reason why we shouldn't make the depth buffer 32-bit by default on desktop. If support goes back as far as Direct3D 9, one has to wonder why anyone would stick with 24-bit, given that there are depth precision issues only about 200m away, which is smaller than a lot of games even from the 6th gen era of consoles.

This problem is the one remaining wart (aside from shader compilation) that affects general 3D usage in Godot, and it is so easy to fix for most use cases. I don't see why we don't just use 32-bit.

KeyboardDanni commented 1 year ago

Also, here is DXVK falling back to 32-bit depth buffer over 24-bit because 24-bit is less supported on some AMD configurations: https://github.com/doitsujin/dxvk/blob/master/src/dxgi/dxgi_format.cpp#L852

Zireael07 commented 1 year ago

90% hardware support

Is this true for Android too? Or is this just laptop/desktop PCs?

KeyboardDanni commented 1 year ago

It's the overall total for all listed platforms (Windows, macOS, Linux, Android, iOS). For D32_SFLOAT (i.e. 32-bit with no stencil), Windows, macOS, Linux, and iOS have 99-100% support. Android is a lot less clear-cut at 82%. Those numbers in more detail:

Windows:

macOS:

Linux:

iOS:

Android:

From this data, it looks like 32-bit depth buffer should be default on desktop, whereas mobile might be better off defaulting to 24-bit with 32-bit fallback.

Ansraer commented 1 year ago

Hey @cosmicchipsocket, did you ever get around to trying reversed z with a 32-bit depth buffer? I personally would really prefer rev. z over a log depth buffer. IMO it is an elegant solution that nicely distributes the available precision between the near and far plane while still being easily understandable for inexperienced shader authors.

clayjohn commented 1 year ago

Defaulting to 32 bit depth on the clustered renderer makes sense to me. I don't know what the original rationale was to use only 24 bit depth.

Currently it is configured to fall back to 32 bit depth in systems that don't support 24 bit.

Calinou commented 1 year ago

I don't know what the original rationale was to use only 24 bit depth.

Doesn't using a 32-bit depth buffer lock you out of stencil bits entirely? IIRC, the typical configuration in game engines is to use a 24-bit depth buffer so you can have a 8-bit stencil buffer.

We don't use stencil yet, but it may be exposed to custom shaders in the future. That said, it seems reasonable to me that if you want to use stencil, you'll have to adjust a project setting to change the balance between depth and stencil bits. However, this will make setting up those custom shaders slightly more complex (for users going through the asset library and downloading shaders that make use of stencil).

clayjohn commented 1 year ago

@Calinou On desktop support for D32_SFLOAT_S8_UINT is pretty much as good as D32_SFLOAT as shown in the comment above

KeyboardDanni commented 1 year ago

I imagine that D24_UNORM_S8_UINT is so that you can fit both the depth and stencil into 32 bits. But according to this forum thread, AMD hardware post-GCN puts depth and stencil on separate planes, so I'd assume there would be no performance advantage to using 24-bit, even if it were supported: https://www.gamedev.net/forums/topic/691579-24bit-depthbuffer-is-a-sub-optimal-format/

Unsure about nVidia hardware regarding 24-bit/32-bit performance, but overall D32_SFLOAT_S8_UINT seems to be a pretty safe default for desktop, in all respects. Not to mention, I don't think the reverse Z trick really does anything at 24-bit because it's an int format, whereas 32-bit is float.

We don't use stencil yet, but it may be exposed to custom shaders in the future.

According to this setup code, stencil is disabled when MSAA is enabled, presumably because the two are incompatible? Might be worth documenting for those who want to use stencil.

Calinou commented 1 year ago

Looking at where this needs to be changed:

Note: AMD on Vulkan doesn't support 24-bit depth buffer (it always uses 32-bit instead). Make sure to test this on NVIDIA or Intel to see any difference.

MoritzMaxeiner commented 1 year ago

For anyone interested, I've hacked together a prototype for godot 4.x's forward clustered renderer: https://github.com/MoritzMaxeiner/godot/commit/c10940f499ddbb45f18f5a73660a0d4cddc84ea1

Many thanks to roalyr and gkjohnson for their respective previous work.

For anyone who wants to try it out, I've so far tested only with

scons platform=linuxbsd use_llvm=yes linker=lld dev_build=no target=editor precision=double

, which yields the following scenery (the sphere is roughly the size of our moon):

prototype_logzbuf

As I'm a complete newbie with Godot, there's probably a lot I missed, but it seems to work ok for my personal use case.

Ansraer commented 1 year ago

@MoritzMaxeiner That looks great, but I think you could do even better. While log z kind of solves the problem it is not a perfect solution. To avoid artifacts you will have to manually write the depth, which automatically disables many optimizations such as fast z or early depth test. Trust me when I say that you really don't want that.

99% of the time what you want instead is to use reverse z. All you have to do is to flip z in the projection matrix, change the depth clear color and then invert all the depth comparison functions. The inverted depth gives us higher precision the closer we get to the far plane, while float gives us higher precision close to the origin. Combined this results in a fairly even spread of precision from the near plane to the far plane. And since yhere is no need to write the depth value manually we can still use depth based optimizations. I am fairly certain that some of the links posted above have nice graphs and better explanations than what I just wrote on my phone.

Would be great to know if this solution could solve your problem as well (I can't see why not), I would really like to avoid adding log z to godot if at all possible.

EDIT: in opengl you will also have to set clip manually (since it defaults to [-1,1] iirc) using an extension for this to work. But tbh I am not sure if reverse depth is even needed in the compat renderer. 🤷‍♂️

KeyboardDanni commented 1 year ago

@MoritzMaxeiner I don't see any overlapping geometry in this shot, so it's hard to tell what effect this has on Z-buffer precision.

MoritzMaxeiner commented 1 year ago

@Ansraer First, thank you for your feedback, someone could almost certainly improve upon this :) I just wanted a quick but sufficiently accurate way to interactively simulate how the sky would look like to a human observer from the ground of different celestial objects in a custom solar system, including celestial movements. That means I need to reasonably accurately render objects from about 0.1m (laying on the ground) to a couple of billion km, at least. I saw there's a working log zbuf implementation for godot 3.x, but I also happened to need double floating point precision, so I essentially just did the "mechanic" porting.

With regards to the performance penalty: It would be nice to avoid as a matter of principle, but pragmatically speaking, in my personal use case, I don't care about it. I appreciate your explanation of how I would go about making a reverse z work, but that's all way too much work for my personal use case, especially since I would have to start touching even more of godot's internals for that. To be clear, I'm not advocating this implementation should be used as is in godot, it's just for reference and in case anyone might find it useful.

On the matter of what I personally would like to see in godot: It would be nice to have it as a project option, so you can choose between "classic / default z", "reverse z", and "logarithmic z", ideally the switch would be static without runtime overhead. But since log z does all I need and the downsides don't concern me for my use case (at least for now), I'm quite happy.

@cosmicchipsocket Well, if you want a direct comparison, here it is. Up is the default, bottom is log. But you can just build it and try it out yourself, it's quite simple:

zbuf_linear zbuf_log

roalyr commented 1 year ago

Just in case, here is my 3.x fork with log depth and some editor tweaks: https://github.com/roalyr/godot-for-3d-open-worlds

roalyr commented 1 year ago

Also, in case if in future we'll see optional kinds of depth buffers, mqke sure that editor is properly tweaked to acomodate the allowed ranges (like here, for instance https://github.com/roalyr/godot-for-3d-open-worlds/blob/3.x/editor/plugins/spatial_editor_plugin.cpp).

It is also worth exploring and setting up a proper z-far limits for each case.

Calinou commented 1 year ago

I made a 4.0 test project for people interested in working on improving depth buffer precision: test_zbuffer_precision.zip

roalyr commented 1 year ago

I'm in!

violetfrost commented 1 year ago

For anyone interested, I've hacked together a prototype for godot 4.x's forward clustered renderer: MoritzMaxeiner/godot@c10940f

Thank you, @MoritzMaxeiner! I modified your code a bit to make it work with Godot 4.0.2 - you basically saved my entire project. The little prototype might see the light of day at some point thanks to you!

nickpolet commented 1 year ago

@violetfrost Any chance of putting the changes out there on a branch? Was about to embark on this and was wondering if seeing what changes you made would make this an easier journey.

violetfrost commented 1 year ago

Hey @nickpolet! (and everyone else watching)

My apologies for the severe delay in posting this - I've been rather busy this month and haven't had much time to work on hobby projects.

That said, I (very poorly) ported @MoritzMaxeiner's wonderful work to Godot 4.1 - I'm something of a novice when it comes to GLSL, so I almost definitely butchered the part about layout locations, but it boots, so that's what's important.

Hope this is helpful to anyone who's out there browsing! In the not-so-distant future I also want to convert the default Godot shaders to make use of the LogDepth so we can have fancy effects like SSAO back.

https://github.com/violetfrost/godot/tree/4.1

forestrf commented 11 months ago

On Intel 2nd generation (and on others CPUs, I don't know which ones specifically) plus on some mobile devices and consoles, D32_SFLOAT_S8_UINT disables Z-culling while D24_UNORM_S8_UINT doesn't.

Calinou commented 11 months ago

On my end, I tried to replace D24_UNORM_S8_UINT with D32_SFLOAT_S8_UINT on Linux + NVIDIA GeForce RTX 4090, but didn't notice any visual improvement in a scene with 2 PlaneMeshes crossing each other (with a slight angular offset on one of them, so that it doesn't Z-fight across the entire plane).

On Intel 2nd generation (and on others CPUs, I don't know which ones specifically) plus on some mobile devices and consoles, D32_SFLOAT_S8_UINT disables Z-culling while D24_UNORM_S8_UINT doesn't.

Godot 4.x doesn't support Intel Sandy Bridge IGPs or very old mobile devices, as these lack OpenGL 3.3/OpenGL ES 3.0 support. Godot 4.x requires OpenGL 3.3, OpenGL ES 3.0 or Vulkan 1.0 to run.

On the console side, we don't expect Godot to be ported to systems older than PS4/Xbox One/Switch (for officially licensed ports, that is).

forestrf commented 11 months ago

Godot 4.x doesn't support Intel Sandy Bridge IGPs or very old mobile devices, as these lack OpenGL 3.3/OpenGL ES 3.0 support. Godot 4.x requires OpenGL 3.3, OpenGL ES 3.0 or Vulkan 1.0 to run.

On the console side, we don't expect Godot to be ported to systems older than PS4/Xbox One/Switch (for officially licensed ports, that is).

Intel second gen is supported by Godot, maybe unontentionally, when using Linux as the Mesa drivers support opengl 3.3 on that chip. I tested it myself. Nintendo Switch is an example of a console which needs depth to be 24 bits of lower when using stencil to keep the z-culling enabled.

So at least the Switch needs that depth+stencil format. We need to know what other supported gpus also need it. I've spent an hour trying to find at least a list of gpus or something that shows when 24 bits or less are needed for the hierarchical z buffer metadata to work and not disable z-culling but found nothing.

roalyr commented 10 months ago

Did anyone here try a reverse depth buffer yet? Is it applicable for Vulkan backend?

roalyr commented 10 months ago

For anyone interested, I've hacked together a prototype for godot 4.x's forward clustered renderer: https://github.com/MoritzMaxeiner/godot/commit/c10940f499ddbb45f18f5a73660a0d4cddc84ea1

Many thanks to roalyr and gkjohnson for their respective previous work.

For anyone who wants to try it out, I've so far tested only with

scons platform=linuxbsd use_llvm=yes linker=lld dev_build=no target=editor precision=double

, which yields the following scenery (the sphere is roughly the size of our moon):

prototype_logzbuf

As I'm a complete newbie with Godot, there's probably a lot I missed, but it seems to work ok for my personal use case.

I am considering to make a 4.x flavor of my Godot fork that uses log depth and compile Linux/Windows/Android binaries and templates for it for ease of use. I will reference your changes, and see if there is anything else that should be tweaked (like in 3.x I had to tweak far-z plane definition in order to preven flickering glitch).

One thing, though, that I was wondering, was whether you have tested anything related to reverse linear depth buffer or no.

roalyr commented 10 months ago

Another issue is to intercept and properly implement all the depth-related stuff like shadows. If anyone can point out where shadows make use of depth buffer and decode it linearly, so that it could be changed to logarithmic transformation instead - please let me know (also applies to other depth-related things). I am not really competent to say that it will work as intended (or at all), but that's what I see as the main issue.

Khasehemwy commented 8 months ago

Great question! In addition, is reversed-z supported on forward+ now?

roalyr commented 8 months ago

AFAIK no one implemented anything in source code yet.

But you can implement Log Depth in .gdshader right in the project, so, probably, testing reverse normal buffer can be done like that too?

clayjohn commented 8 months ago

We should implement reverse-z as @Ansraer describes above, this is pretty easy to do in the RD renderers, but the OpenGL renderer won't be able to benefit. For the RD renderer, it shouldn't be too difficult to implement, but it will require changes to a few systems:

  1. Should switch to 32 bit precision by default (if we haven't already)
  2. Need to reverse the projection matrix (using the "correction matrix" which already converts to 0-1 range)
  3. Need to clear depth to 0 instead of 1
  4. need to flip the behaviour of the depth tests (LESS becomes GREATER and vice versa)
  5. Need to adjust the shader compiler to handle reverse z without breaking compatibility. To do so, we need to add compatibility functions when a user reads from the depth texture, or when the user writes to DEPTH manually.

For the compatibility backend, we can just leave it as non-reversed-z because the extension to benefit from reversed-z isn't well supported and since what we expose to users is non-reversed-z there is no point in reversing it for consistency.