Implement on-demand shadow maps in the Vulkan renderer

godotengine / godot-proposals

Godot Improvement Proposals (GIPs)

MIT License

1.16k stars 97 forks source link

Implement on-demand shadow maps in the Vulkan renderer #3073

Closed mrjustaguy closed 2 years ago

mrjustaguy commented 3 years ago

Describe the project you are working on

Light Rendering

Describe the problem or limitation you are having in your project

Shadow maps are hugely inefficient (Rendering several Screens worth of pixels, most of which don't actually contribute to changing the image - for explanation, see (1) ), and suffer from many issues, Here's a short list of some of the issues: 1) Shadow Acne 2) Peter Panning 3) Applicable only on Limited distances (Directional shadows have a Max distance, other lights start getting Shadow Acne when the Light casting is far from the Shadow) 4) Pixelization (not high enough resolution -> not enough data) 5) Performance (with higher Shadow Map resolutions)

These issues are why Shadows are a royal pain to deal with (Directional especially, Spot lights and Omni lights aren't nearly as hard to deal with) as methods to mend one issue increase the effects of other issues.

Here are a few examples: Increasing Shadow Bias to reduce Shadow Acne leads to Peter Panning Decreasing Shadow Bias to reduce Peter Panning leads to Shadow Acne Increasing Shadow draw distance (Directional Lights) leads to reduced Perceived Shadow Resolution Increasing Shadow Resolution to increase Perceived Shadow Resolution leads to Worse Performance Decreasing Shadow Resolution to increase Performance leads to reduced Perceived Shadow Resolution, and increases Shadow Acne

This results in Shadows being a constant Balancing act, which is by no means easy to manage.

Rendering Shadow Maps can be made smarter, reducing the need for rendering more pixels then are on screen, as described below.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Render Shadow Maps on Demand (needs GPU Compute), meaning the following happens: 1) Render the Player Camera Depth map 2) Transform Depth map coordinates into World coordinates (Requires GPU Compute) 3) Render Shadow map pixels for all the resulting World coordinates (Probably Requires GPU Compute too) 4) Apply Shadows to all the pixels

Result: Shadows as far as the eye can see, no Shadow acne, no Peter Panning, Minimal GPU Pixel Fill rate hit

Possible issues: Depth buffer to World Coordinates doesn't give exact World Coordinates, only an approximation, which is all fine for close up because of the Z-Buffer density close up, but might lead to flickering shadows far away, however they'd probably be flickering so far from the camera that they wouldn't actually be observable

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Steps are Explained above, Here I'll give detail to a few things to keep in mind when implementing: 1) Depth Buffer->World Coordinates translation Only gives a Rough World Position, as explained in the Possible Issues 2) Omni Lights are not as simple as Spot & Directional Lights, and would be significantly harder to implement this way (probably) 3) Soft Shadows would be created by sampling shadows around the Relevant World Coordinates, from the perspective of the Light (say the Numpad 5 is the Relevant World Coordinate, sampling would be done around 5 (2,4,6,8 for 4 extra samples) and the softness of the shadow would be determined by how far the sample coordinates would be taken from the center coordinate, and each sample would weigh it's result (in shadow or not) into the final shadow value)

(1) On a 1920x1080p display, a soft shadow would result in a shadow draw cost of about 10 million shadow pixels vs about 2 million shadow pixels for hard shadows. This is still better compared to the roughly 16 million shadow pixels created by a 4096x4096 Directional Shadow Map cost, and all things considered, is vastly more efficient to the traditional method with better results compared to the traditional method

If this enhancement will not be used often, can it be worked around with a few lines of script?

One could implement this with the new Rendering Device, but it wouldn't be very efficient.

Is there a reason why this should be core and not an add-on in the asset library?

Rendering is Core, and Shadows are an Extremely common feature of 3D Games.

Calinou commented 3 years ago

I believe baked lightmaps are still the best solution to this problem if you want to avoid shadow acne, peter-panning and limited shadow distances. AAA games still use lightmaps today :slightly_smiling_face:

For dynamic object shadows, see https://github.com/godotengine/godot-proposals/issues/2354 as usual.

As for the proposal made here, I think temporal stability is key, and I'm not sure if this proposal makes it possible to achieve this. We don't want shadows that oscilliate around when the camera moves or rotates, as these look really bad in most situations (worse than blurry shadows, I'd say).

mrjustaguy commented 3 years ago

Temporal Stability is important, it has been considered and this method very likely wouldn't have major issues with Temporal Instability, especially on objects close up which take up most of the screen.

The only thing that can have an impact on it's Temporal Stability is the accuracy & consistency of the Depth map to World coordinates transformation, however this is only a concern for pixels far away from the camera, because the depth map is significantly more accurate close up and would thus far better translate itself to the actual World coordinates of a pixel, which I have stated under possible problems.

This method would be superior to Baked Lightmaps in several ways 1) No need to Bake 2) Baked Lightmaps take a fair bit of Storage & Bandwidth (it isn't horrible but eh) 3) Shadows just work anywhere, and everywhere 4) Can be used in procedurally generated levels easily 5) Can be used on Large Dynamic Objects

Now, I know that 1, 4 and 5 are basically all down to baking, but let's say that one could bake at runtime like SDFGI does for GI, 5 would still be an issue and it wouldn't allow for said object to have good shadows (probably)

logzero commented 3 years ago

Is there an example implementation of this proposal?

I am specially curious about point 3, sounds like raytracing to me.

Are raytraced shadows maybe what you actually are asking for?

mrjustaguy commented 3 years ago

Result is similar to Raytracing (in that it would provide shadows for every pixel for every light, and not have prominent dumb shadow mapping issues like shadow acne and peter panning), but it would be using the same method as current shadow maps do to render shadow maps, however with one KEY difference - instead of just drawing a ton of pixels (generating occlusion data) for a general area, With this implemented, it'd know which places Need the occlusion data, and would thus save on pixel drawing, and result in all the pixels on screen having shadow data for every light that affects them.

Think of it like this: Current Method - Dumb - Just generates occlusion data, and because it doesn't know what the screen sees it has to generate a ton of occlusion data, most of which is not actually needed on screen, while failing to create enough occlusion data with enough granularity for other parts of the screen. If you think of it in terms of the 3D world to Shadow Map, you need Higher Resolution Shadow Maps to generate shadow data for each unit of 3D space to reduce the information gap between 2 given points in the 3D space. See https://learnopengl.com/Advanced-Lighting/Shadows/Shadow-Mapping under Shadow acne, the 2D diagram of what's happening that explains it quite well is a result of lack of granularity. For the Proposed method, the granularity of the output is such that the squares are super tiny, so much so that they can be considered just a straight line, as opposed to the stair case the traditional method provides.

Proposed Method - Smart - Generates occlusion data ONLY in places that are visible on screen, and all occlusion data is used to produce the end result (the image on screen), all occlusion data would be rasterized much like with the current method, but the accuracy of where in the world that data is relevant is GREATLY increased. What this method does is it provides every 3D point with Shadow Data, when said point is the point that is rendered to screen, thus resulting in screen resolution number of 3D points in need of shadow information, and only that many pixels (plus for soft shadows for each extra sample that many more pixels) with the coverage of Shadow Data being as far as the screen cares, infinite (so without gaps in data)

Ray Tracing - Smart & Expensive - Generates occlusion data only in places that are visible on screen, but instead of rasterizing occlusion data the same way normal shadow maps do, it shoots rays into the world, and generates occlusion data by seeing if rays hit an occluder or not (which is expensive to compute, unlike traditional raster drawing that Shadow maps use, which are cheap in comparison)

The rendering of the Shadow maps would only differ in one way - Instead of rendering them as a "screen" (so continuous lines of pixels) it would selectively render pixels where needed. Also this statement is a Gross oversimplification, as this method would also render a screen of pixels, just the change is what position in the world each pixel in the shadow map would correspond to

This wouldn't need RTX hardware, just Compute to transform Depth Coordinates to World Coordinates, and possibly need Compute to do the Selective part of the smart Shadow mapping rendering (so like it would probably need changes to the inputs the renderer is using, specifically adding the "Where to render" part)

TLDR - Proposed method is just about cleverly deciding where to rasterize (in the world) the shadow maps we currently use with no RTX, to result in a stupidly high level of useful shadow data with the limited amount of data that can be rasterized, compared to the current method's high level of, at the end of the day, useless data, and further more getting those improvements without brute forcing it (like by rasterizing (Insert stupidly large number x Insert stupidly large number) of pixels, just to not have actual use of most of them on an image)

mrjustaguy commented 3 years ago

Here's the computation costs that have been considered with this method 1) Camera Depth Map - Very cheap, done by default anyhow 2) Translating Depth Map to World Coordinates - relatively cheap, but not done currently, so added cost to current method 3) Working out which pixels the shadow map needs to rasterize - unclear, probably in the cheap to basically free range (as this is actually dependent on how one would tell what needs to be computed) 4) Rasterizing Shadow Map - Cheaper compared to current methods for directional lights (1920x1080 vs 2048x2048 or higher), however soft shadows increase the cost by screen resolution times additional samples, Also Omni lights and Spot lights would be more expensive if they're taking more of the screen, as each would with this method generate as many pixels as they are effecting times the number of additional samples for soft shadows, which could trade blows in terms of performance with current omni and spot lights (lets say 10 lights all with 5 samples and all taking full 1920x1080p screen - would take 103.7m pixels for shadows vs 238.4m for a traditional 16k x 16k total shadow map area, so bumping res up to 4k would result in half the performance of the traditional shadow maps, however keep in mind that something like half res shadow mapping could be done like with SSAO, however this could cause visible artifacts)

Conclusion - Performance of Current techniques is constant (you're splitting the shadow map among all lights and then drawing to the map) while this method would fluctuate depending on the number of lights affecting the screen (with considerations of how much of the screen is being affected) This however doesn't take into account any possible optimizations that could be done to the new method, which could be such that only a limited amount of lights are allowed to affect every given pixel in the new method, with the rest reverting to the traditional method (Omni and spot first, this new method improves Directional shadows by far the most) Directional Lights should probably have the new method by default (because they always affect the entire screen anyhow and would just be better in practically every way) while for Omni and Spot lights this should be an option in case traditional methods are giving issues to specific lights (which are usually very large lights, which benefit most from this method)

mrjustaguy commented 3 years ago

Ok I got an idea how to explain this proposal and why it's not RT in a much more concise manner @logzero so if you don't want to get lost in my large explanation mess above here goes:

Traditionally Shadow Maps draw a Depth map from the Shadow Camera and that's it. The Shadow Camera decides which Points in 3D world each pixel is assigned. This Proposal Wants the View Camera to assign said 3D world points to the Shadow Camera for rendering instead if it deciding them by itself. The rest is just good old Shadow map doing it's rasterization duties basically unchanged.

Gone will be the days of very limited shadow distances, and shadow acnes, peter panning, and low res shadows. Performance will be comparable (for Directional shadows at least) to what they are now (however unlike now shadow costs scale with Screen/Monitor Resolution, which can give both positive and negative performance implications, which is why Traditional method should still be available for some scenarios)

clayjohn commented 3 years ago

I'm not sure I am fully understanding your proposed method here. It sounds like you are proposing that we use the depth buffer to select which pixels need to be rendered in a shadowmap and then render the shadowmaps in screen space.

I think this proposal misses 2 very important considerations which appear to be bundled under consideration 3 above.

what happens to objects outside of the camera view (e.g. those that are not captured by the depth buffer)?
How do you tell which pixels need to be rasterized? (presumably this is done by marching through the depth buffer towards the light in world coordinates, in which case it is just Screen Space Shadowing AKA Contact Shadows)

Working out which pixels the shadow map needs to rasterize - unclear, probably in the cheap to basically free range (as this is actually dependent on how one would tell what needs to be computed)

The traditional way of working out which pixels the shadowmap needs to rasterize is to just rasterize all objects that are within the lights frustum and see what happens. From the perspective of a given pixel in the camera view, the only way to check it is obstructed from the light would be to perform a check against all possible occluders (this is how ray tracing works). In this proposal it is unclear how you would know an object is supposed to be in shadow without looping through all objects in a compute shader and performing ray-object intersections for all of them.

I may have misunderstood what you describe above, because to me it sounds like you are suggesting a Screen Space Ray March for each light to determine shadows. Screen Space Shadows can be quite effective, but they are slow, and they can only calculate shadows from geometry within the camera view. Accordingly, they are best used to supplement traditional rasterized shadows.

The last message makes me think that the proposal is more about skewing the shadowmaps to provide more resolution in areas that need it. I'm not sure how you could combine that with rasterizing the geometry to the shadowmap, it would likely require a very different geometry pipeline than what is supported in current gen hardware. But if this is the approach you are suggesting, we can discuss what the possible barriers/issues would be and see if we can come up with some solutions.

mrjustaguy commented 3 years ago

Ok so I'll go with Directional Lights as an example as they're the easiest to explain.

The Shadow Map is essentially a 2D Canvas, where it generates occlusion data for things from the Camera, to whatever the max distance is set to. Let's say the Camera is Static for a Moment, and that the only variable is Max Distance. If you look at the Shadow Map, changing the Max distance results in a Zoom in/out depending on if the change is positive or negative. Increasing the Distance, results in Zooming out, which in reduces the effective Resolution of the Shadow Map because to The Player Camera, you are increasing the number of World Points that each one of the Shadow Map Pixels is responsible for providing data to.

Now This is a limitation because the Shadow Map is Rasterized, which doesn't perfectly translate the Vector data that it's dealing with.

To Combat that problem, instead of having the Shadow Map Rasterization take place without knowing which points actually need the data, you instead of feeding it the Camera position and Distance (zoom level), instead Feed the Shadow Map Rasterizer with All the 3D points that the camera sends to it (obtained from the camera's depth pass) projected to The shadowmap's Canvas with X,Y being where on the 2D plane those 3d Points are and tell the Shadow Map to Rasterize Those points in the 2D canvas.

Oversimplified Example: Cam Pixel (346,10) gives world coordinates that when projected to the Shadow canvas have the position (2752.156,1805.002), The Shadow Map Rasterizer Does a Depth map render for said position, and places the result with (346,10)

Normally Shadow Map would rasterize (2752,1805), and that data would only be valid for that point, but would be spread across the world points that are in the range (2751.5, 1804.5) to (2752.5, 1805.5) which would causes a translation issue because the data might only be valid for that one point (2752,1805)

So really, the Idea with 3 is to work out a method to tell the GPU which points to Compute on the Shadow Map's Canvas, and directly 1-1 to be a screen size information buffer. The Shadow Map Rasterizer would Rasterize much like it does now, with the only difference being is that it would be Targeted at rendering very specific points that it's told to.

I hope I've managed to explain this sufficiently well, and not create an unruly mess of words.

clayjohn commented 3 years ago

Thanks for the explanation! It was very helpful. Unfortunately I don't think what you are proposing is possible.

A rasterizer works by taking an object and splatting it on to a canvas. Repeatedly doing this for all objects gives you a picture of the scene. In the case of a shadow map, you figure out the depth at a given pixel by rasterizing all objects and then seeing what happens. There is no way (with a rasterizer) to choose a specific pixel and rasterize it by itself. You have to loop through objects.

This is the fundamental difference between rasterization and ray tracing. With ray tracing you loop through each pixel and ask "what object can I see from this pixel?" while with a rasterizer you draw all objects at once and then get the full image. Accordingly, for ray tracing, you can save a lot of performance by being selective with which pixels are used, but with rasterization, you don't get to do that.

Finally, the slow part of rasterizing a shadowmap is looping through the objects and sending their data to the GPU to be drawn. The actual pixel-by-pixel portion of rasterizing (i.e. the fragment shader) is so cheap that you wouldn't obtain a speed benefit by selectively turning it off and on.

Your idea would be a really interesting optimization for a raytracing based renderer as it would allow you to only cast shadow rays for visible portions of the scene (that is, assuming you don't have any reflections), but I don't think it would add any performance benefit for a raster-based renderer.

mrjustaguy commented 3 years ago

The idea was more to deal with the issues of traditional shadow mapping, and yeah the issue of the Rasterizer drawing what it wants (not being targetable, like would be needed for this method) I sort of somewhat saw coming. That's why I expected a possible need for Compute to be needed for the rasterization step, to add that variable (that would be the exact shadow map sub-pixel targets) and then doing the standard rasterization math for all of the 2 million (or whatever the Screen resolution is) points a frame.

mrjustaguy commented 3 years ago

Ok as far as Directional Shadows go, I've got an idea how these shadows might be implementable, and would be entirely compute based (Shadow maps wouldn't be Rasterized the way that they are now)

Once The Depth Pass is done, and the Fragment World Positions are obtained, the Fragments' World Positions are projected onto the Directional Shadow's viewport plane, transforming them into an X,Y point on said plane

Do a Bounding Box Test for all objects in the scene, for Each test check all World points if they are inside the 2D Projection of the Box (This is an optimization, in theory one could just project all the Object Triangles in the scene on to the Shadow Plane and for each triangle poll all the Fragment Positions to see if any are inside a triangle, but this would add a ton of pointless calculations)

If a Fragment is inside of a Triangle, test if the Fragment's World point is above or below the triangle, if below, remove the Fragment from further processing as it is Shadowed, else keep trying to find if the Point is below any other triangles, or all the triangles have been passed and it isn't in shadow, in which case apply the light

TLDR: For the Directional Shadow (which would most benefit from this, in terms of quality improvement) the Pipeline would go something like this (without optimizations that can be done to significantly improve speed):

Get Active Camera's Depth Map -> Determine World Positions for all of it's Fragments (pixels) -> Create a Copy of the World Positions Projected onto the Directional Shadow Map's Plane (Vector2) -> Iterate over every triangle in the world -> Project Copy to Shadow Map's Plane -> Iterate over every Projected World Position and see if inside Current Projected Triangle (2D) -> If Projected World Position Inside, Check if World Position is above or below Triangle's Plane (3D) -> If below, remove World Position from the list -> When out of Triangles, All Fragments remaining in the list get Lit by the directional light

Note: This method would require a new method for Soft Shadows, Also, in the stage of testing if the World Position is above or below a Triangle's Plane, make sure the Plane is not facing away from the shadow camera to prevent meeting the shadowing requirement both from the light source and the opposite way of the light source. This Test should actually be done before projecting the Triangle to the Shadow Map's Plane, which would reduce the number of triangles in later stages of the pipeline

This method probably only makes sense for Directional Shadows, as Omni and Spot lights don't have an "infinite" range to which they are applied and don't have as many issues with their shadows, while Directional Shadows have a Limitation of How far they can be drawn

mrjustaguy commented 2 years ago

Hmm, I think what I've actually been trying to propose is something similar to Virtual Shadow Maps in UE5 https://docs.unrealengine.com/5.0/en-US/ReleaseNotes/

Ofc, the way I've been thinking about getting a similar effect is different

mrjustaguy commented 2 years ago

Scratch that last post, this basically what the Original Proposal was aiming at, just with a few enhancements (Soft Shadows). https://developer.nvidia.com/hybrid-frustum-traced-shadows-0

Only downside is I cannot figure out AMD/Intel Support for this feature...

mrjustaguy commented 2 years ago

Closing, as with https://github.com/godotengine/godot/pull/60178 and the Consistent Shadow Blur between Splits PR gets the Shadow Rendering quality up significantly, to the point even a 4k shadow map can be used to cover 4km distance and still provide really good shadow quality for the most part (albeit with Split Blending disabled) and Higher Resolution Shadow Maps significantly improve things from there, and Increasing the Distance more doesn't significantly change the quality too much when you're talking about such large distances.

If anything, the only Shadow Quality Improvements that would provide everything needed for High Quality Dynamic Shadows at high distances are at this point https://github.com/godotengine/godot-proposals/issues/3908 (Sphere Splits) and https://github.com/godotengine/godot-proposals/issues/4387 (Solution to Light causing shadow acne on grazing angles) both of which likely wouldn't come at a significant performance cost, which this probably would..

Aside from the one instance of Shadow Acne that has been observed at this point, all the original issues the Proposal aimed to solve have been dealt with 1) Only has Shadow Acne when significantly too Low Bias or in Case mentioned above 2) Bias values that don't produce easily observable peter panning don't have Shadow Acne Issues 3) The distances at which the Directional Shadows can be used at, without a significant cost to quality have been greatly improved with the above PRs 4) Was much more observable with the now solved Blur & Normal bias issues with distant splits 5) Higher Resolution Shadow Maps are no longer needed to get good quality shadows across such distances, again due to the PRs