godotengine / godot-proposals

Godot Improvement Proposals (GIPs)
MIT License
1.16k stars 97 forks source link

Improvements to performance on directional shadow maps #6948

Open BastiaanOlij opened 1 year ago

BastiaanOlij commented 1 year ago

Describe the project you are working on

For the past few weeks I've been pulling the directional shadow map implementation apart for the Vulkan renderer to try and see where we can make improvements. This proposal attempts to bring some of the ideas we've already tried, some existing suggestions and some new suggestions together so we can further discuss where we should put our efforts.

Describe the problem or limitation you are having in your project

There are a number of issues with directional shadow maps, both on a level of quality and performance.

On the subject of quality it is important to note that there is nothing wrong with the approach Godot currently takes for directional lights. This is mostly a matter of perfecting settings and understanding that settings for good looking shadows differ widely depending on scene composition. There is no magical default that works.

The focus here will then be on performance and how we can minimize the overhead of updating directional shadow maps as these often require frequent updates.

To understand the approach Godot takes to rendering directional shadow maps, Jonathan Blows blog posts on stable cascade shadow maps is a good read: http://the-witness.net/news/2010/03/graphics-tech-shadow-maps-part-1/

In order to further investigate and visualise the use of cascaded shadow maps two PRs were implemented:

For our test scene we can see that our cascade distances are nicely setup: image

But looking at our frustums we can see that at this view angle we not only have limited coverage in the shadow maps, we're rendering a lot of geometry that is never sampled within the view frustums: image

Moving the camera around we can see that at certain angles the coverage does increase but we still are left with large areas of the shadow maps never being used in the end used.

Now we could "solve" this by changing the projection we use to render the shadow maps to have better coverage but nearly all techniques will either lead to further visual artifacts or to other visual side effects as the player moves around the level.

In fact, the current approach is about as optimal as we can get it.

The challenge will be to reduce what we render.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Stepped update of the cascades

This has been detailed out in a previous proposal and has been implemented in https://github.com/godotengine/godot/pull/76291 This is a good idea in theory but in practice has a number of drawbacks which resulted in this PR being on hold. We think we will be able to resolve these drawbacks by rendering static and dynamic objects separately however for directional lights this may be a dead end.

Splitting static and dynamic objects

So as mentioned above lets look at this technique. In a nutshell this change will result in two shadow map textures being maintained.

One contains only static objects (object that do not move). Generally speaking the majority of a scene will be static and this allows us, provided our light is also stationary, to render all the static geometry to this shadow map only once (or at least at low frequency).

The other shadow map is updated every frame in which dynamic objects have moved within the clipping volume of the light. We start by copying the static shadow map into the dynamic shadow map and then render all the dynamic objects into this. This is then used when applying shadow to our render result. The obvious gain is that much less geometry is rendered each frame.

This is already on the roadmap however the operative word is provided our light is also stationary. Now while a directional light is stationary, it's shadowmaps are dependent on the position of the camera. As the player moves, even the static shadowmaps need frequent updating and the overhead of performing two passes may outweigh the gains.

There may be a gain when using this in combination with the aforementioned stepping approach however this would require adding a reprojection of the static shadow map if we didn't update that shadow map in the current frame but are

Limiting the light shadowmap frustum

This is potentially the easiest win. Without changing the dimensions of the shadow maps, we can limit our render area and adjust the lights projection matrix (and thus clipping volume) to only render the general area the view frustum covers.

So we could end up rendering just: frustum renderarea

Multiview shadowmaps

Just for illustration I enhanced the frustum drawing logic to draw all frustums in the last cascade (might actually update the PR with this): image

No matter how we turn the camera or angle our light, our complete view frustum will always fit within our 4th cascade.

edit this is not entirely true for view frustums with a FOV less then 65 degrees, however by being smart with culling we can create a single drawlist with all culled objects and removing duplicates.

While this will put a requirement on hardware supporting multiview, and will require us to change the shadowmap logic to use layers, this opens the door to use the full frustum to cull what is rendered and only do a single pass instead of 4 passes, especially considering the last pass would have hit all objects to begin with. The overhead saved by not processing 4 passes will likely outweigh the overhead introduced by multiview discarding triangles in the lower cascades.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

n/a

If this enhancement will not be used often, can it be worked around with a few lines of script?

This is core to the rendering pipeline

Is there a reason why this should be core and not an add-on in the asset library?

This is core to the rendering pipeline

myaaaaaaaaa commented 1 year ago

Limiting the light shadowmap frustum

This is potentially the easiest win. Without changing the dimensions of the shadow maps, we can resize our render area and adjust the lights projection matrix (and thus clipping volume) to only render the general area the view frustum covers.

This is how the "Perspective shadow map" family of techniques essentially work:

slide_7

(Source)

They have superior results to Orthogonal shadows (referred to "SSM" below), but are behind even PSSM2:

211473689-d0989b31-fcc7-4315-80a5-224d3a9d89f8

(Source)

Note however that the classic PSM only involves a single split. Given how PSM has been shown to be a direct upgrade over the single-split Orthogonal shadows, it may be worth looking into perspective warping all of PSSM's splits:

slide_11

(Source)

BastiaanOlij commented 1 year ago

@myaaaaaaaaa the problem with perspective shadow maps is that as you rotate the view or move around you can see the shadows deform. This is why Godot has chosen to go with stable shadow map cascades. While we sacrifice some resolution and thus improperly setup cascades will result in more jaggy edge shadows, a properly setup set of cascades provides a very stable shadow environment ideal for games.

mrjustaguy commented 1 year ago

Static & Dynamic Lights make a Ton of sense for Directional Lights, when paired with Stepped Rendering for Static Directional Shadows. Most Geometry in a Game is Static, and Stepped DS Rendering has Proven itself to work Great with Static Geometry, but absolutely Blows up with Dynamic Geometry.

Separation of the two could yield in additional benefits, such as having different Configuration for Dynamic Object Shadows, which Typically have vastly different needs to the rest of the scene, such as much higher resolution requirements but much smaller distances..

This could yield Two types of Performance Benefits: 1) Static Geometry (90% of scene geometry) is Rendered 1 Cascade a frame using Stepped Rendering 2) Dynamic Geometry (10% of scene geometry) is Rendered with 1,2 or 4 cascades a frame depending on config, possibly separated settings, allowing for just orthogonal or 2 split shadows to work great for dynamic objects, fading out when far enough, while still having the Static geometry cast "real time" shadows (as in not lightmap baked)

Stepped Rendering PR has shown itself to yield to nearly cut the Shadow Map Rendering in Half in most of my Tests, and that's rendering 2 Cascades per frame, instead of 1 Cascade Per frame + 1-4 for a smaller substantially subset of geometry, that'd likely cut the rendering of scenes down further (based on the results, with some overhead, possibly down to 1/3 the current rendering time), while not providing any observable artifacts at 60hz

myaaaaaaaaa commented 1 year ago

myaaaaaaaaa the problem with perspective shadow maps is that as you rotate the view or move around you can see the shadows deform.

LiSPSMs and TSMs improve on perspective shadow maps by using alternative frustum projection matrices, which provides results that some may find more acceptable.

See the below video (truncated due to filesize) for a quick comparison between Orthogonal (SSM), PSM, and TSM:

final2.webm

See the below link for the TSM paper and full videos:

https://www.comp.nus.edu.sg/~tants/tsm.html

BastiaanOlij commented 1 year ago

@mrjustaguy the problem is that the camera still moves and therefor the center point of the shadow map. That means that unless the player stands still, we will need to re-render all static geometry on each shadowmap update negating much of the benefit of splitting static and dynamic rendering of the shadowmap.

That said, if we increase the distance over which we snap the center of the shadow map so the static map can be re-used longer, we could mitigate this.

BastiaanOlij commented 1 year ago

@myaaaaaaaaa looks like that video was made with setting to prove a point. I've not seen SSM that badly in a small scene like that. Looking at my own test project its a far larger environment and just the first cascade covers a larger area keeping good quality.

That said, adding PSM and/or TSM support should definately be considered at some point. The big issue IMHO with shadows is that the various technique all have scenarios where their pro's outweigh their cons and they are obvious the better choice. But then you use them in another scenario and suddenly the weakness in a technique becomes apparent.

mrjustaguy commented 1 year ago

@BastiaanOlij In theory yes, in Practice no, as seen in https://github.com/godotengine/godot/pull/76291 even when moving the camera real hard (rotation and translation), if the objects are static the thing is issue free.. The only issue with that PR and it's proposal are the fact that Dynamic objects aren't taken into account properly, which by splitting the two isn't a problem anymore, and if we split the two, no need for updating 2 cascades at a time, 1 will probably suffice for statics, while all have to be for dynamic.

In fact splitting the two and going with 1 static and up to 4 dynamic cascades per frame is going to be an even bigger boon to complex scenes compared to the OG plan of 2 normal cascades per frame, as a good rule of thumb is that over 90% of scene geometry is static in 99% of games

Only downside is more memory usage as you've got The shadow map x2, but eh, even with 16k shadow map that's like only a GB for the DS so..

BastiaanOlij commented 1 year ago

@mrjustaguy it's hard to predict what will count more, doubling the passes every other frame, or just doing everything in a single pass every frame. It would be a clear benefit if we didn't have to constantly rerender the static shadow maps because the player position moves.

Also it will be 2 static and 4 dynamic per frame.

mrjustaguy commented 1 year ago

Not really hard to predict, Most of the Cost is Triangle Processing, which the OG PR is all about reducing, this would just be able to reduce it further if you go with 1 static split and 4 dynamic per frame

I mean take an example scene with 1m shadow Triangles. If 90% of those are static and 10% are Dynamic, here's how The number of Triangles Processed each frame goes (ignoring the culling as all scenarios would cull so that can be ignored for this example):

1) Current Behavior - 1m triangles x4 (once for each split) = 4m triangles every frame (reference) 2) Current PR - 1m triangles x2 (2 splits a frame) = 2m triangles every frame (note Dynamic object issues) 3) Proposed Static/Dynamic+Stepping - 900k triangles x1/2 (depending on how many static splits a frame you go with) + 100k triangles x4 = 900k/1.8m static and 400k dynamic triangles every frame = 1.3m/2.2m triangles every frame (depending on static split count)

With the more Aggressive setup, you're running 32.5% of the triangles that you would be in the reference setup. Now Yes this does ignore the Doubling of VRAM usage and running essentially 2 shadow maps at the same time, but given how little a Shadow map costs when it's not processing many triangles (even if it is fully covering stuff) it'd just be doubling that tiny amount of base cost of it.

A nice real world way to test this would be to have 2 Directional Lights, one casting shadows for Dynamic objects only (4 split) and one for Static objects only (but set to ortho/2 split) however I think the culling ain't working for such a test to be setup right now as afaik it's just ignoring the cull masks rn

Edit - Do Note that This is for Desktop, On Mobile the scenes are both simpler and I know that they may have issues handling stuff that is basically free for Desktop users, and that the overhead of rendering more passes and higher VRAM requirements could indeed make this unbeneficial for Mobile.

Edit 2: I just re-read the proposal, and It's unclear to me as to how Multiview would change the math above as I don't know how it works

myaaaaaaaaa commented 1 year ago

One other possible optimization is the use of occlusion queries to perform "occlusion soft-culling", or reducing the LOD of mostly-occluded objects. See the following PR that implements this for the main render pass, which should be able to serve as a foundation for a hypothetical shadow pass implementation: https://github.com/godotengine/godot/pull/76297

https://github.com/godotengine/godot/assets/103326468/84a1520a-fd69-47d8-835c-967dc9b65f45

Left: "Hard-culling" Right: "Soft-culling"

Occlusion queries have the well-known downside where newly unoccluded objects tend to spontaneously pop into existence due to receiving the occlusion results several frames late. However, with soft-culling, the occlusion query artifacts instead manifest as slightly delayed LOD changes, which should be much less noticeable, especially when done in the shadow pass.

Additionally, there exists the VK_EXT_conditional_rendering extension which allows the GPU->CPU synchronization latency of occlusion queries to be avoided completely. This means that on supported hardware, traditional occlusion hard-culling can also be performed in addition to soft-culling, allowing for even more performance benefits at no visual cost.

See a relevant tweet from the developer of Wicked Engine regarding the combination of VK_EXT_conditional_rendering with occlusion queries for shadow passes: https://twitter.com/turanszkij/status/1442244290592313345

Calinou commented 1 year ago

Traditionally, this incurred a GPU->CPU synchronization overhead, but this can be avoided on hardware that supports VK_EXT_conditional_rendering, allowing for occlusion culling to be performed for the shadow pass as well rather than being limited to the main render pass.

Support for the extension is quite low:

image

It's not clear though which vendors actually support it, since when I click on the details, I see reports for all AMD, NVIDIA and Intel. The extension has existed since 2018, but I don't know since when graphics drivers have implemented it – though I don't think that many people are running outdated drivers.

mrjustaguy commented 1 year ago

GTX 1050 Ti with current drivers supports it according to gpu-z (2016 GPU, Pascal)

image

myaaaaaaaaa commented 1 year ago

Support for the extension is quite low:

It's not clear though which vendors actually support it, since when I click on the details, I see reports for all AMD, NVIDIA and Intel. The extension has existed since 2018, but I don't know since when graphics drivers have implemented it – though I don't think that many people are running outdated drivers.

This seems to be primarily caused by Android:

1686922256

Note that these adoption rates are higher than Variable Rate Shading, which has already been integrated into Godot, and is also an extension that can be seamlessly disabled on unsupported devices:

1686922257

fire commented 1 year ago

Did anyone here try implementing https://web.archive.org/web/20101208212121/http://visual-computing.intel-research.net/art/publications/sdsm/ and can explain the results they got? Trying to see if this is a rabbit hole we've went down.

Unhinged-Dev commented 5 months ago

they really should implement Trapezoidal Shadow Mapping or Exponential Shadow Mapping, the quality increase alone should be enough to justify it, and these methods seem to be more performant, at least in comparison to their sheer quality image

Calinou commented 5 months ago

they really should implement Trapezoidal Shadow Mapping or Exponential Shadow Mapping, the quality increase alone should be enough to justify it, and these methods seem to be more performant, at least in comparison to their sheer quality

This was already proposed in https://github.com/godotengine/godot-proposals/issues/599, but I don't think the benefits outweigh the downsides. Alternative shadow rendering techniques from rendering papers often have poorly documented downsides that you only encounter once you start using them in production 🙂

ESM was also supported in Godot 2.x, and it had a notoriously "washed out" appearance. This was particularly obvious for small shadow casters that are close to the surface receiving the shadow (something VSMs also struggle with).

If you want better directional shadow map quality, https://github.com/godotengine/godot-proposals/issues/3908 is likely the way to go as it's a battle-tested solution.

Unhinged-Dev commented 5 months ago

i dont get the downsides, like they seem like pretty robust methods to me, but i understand the point on the poor documentation

i havent heard of that godot 2.x shadow issue, and searching it hasnt really showed me much, honestly godot 2 was pretty raggedy anyways right? kinda to be expected, im sure with the resources godot has nowadays it would be possible to have a good ESM system (in my opinion at least)

im looking at that pr, looks promising to me, how would i be able to use it? also, if i wanted to implement TSM or ESM, where in the godot source should i go to start?

thanks for the response, truly helpful

Calinou commented 5 months ago

im looking at that pr, looks promising to me, how would i be able to use it?

https://github.com/godotengine/godot-proposals/issues/3908 isn't implemented yet; it's only a proposal. There is a branch linked in the proposal, but it's not in a working state.

also, if i wanted to implement TSM or ESM, where in the godot source should i go to start?

Start here: Internal rendering architecture

However, expect this to be nontrivial. If you plan on submitting this work upstream, you should open a proposal first before working on a pull request.

Unhinged-Dev commented 5 months ago

alr g