bevyengine / bevy

A refreshingly simple data-driven game engine built in Rust
https://bevyengine.org
Apache License 2.0
36.05k stars 3.56k forks source link

Renderer Rework: Initial Merge Tracking Issue #2535

Closed cart closed 2 years ago

cart commented 3 years ago

The Bevy Renderer Rework is starting to stabilize and it is time to start planning how to upstream it! The intent of this issue is to track the work required to merge the renderer rework as soon as possible. This isn't a "renderer feature wishlist". But feel free to discuss what you think should be on this list!

This is an issue to track work that still needs to be done. If there is a name in parentheses, that work has been claimed by someone.

For some history on this effort, check out:

Here is a list of open prs against the pipelined-rendering branch

Missing Required Features

The new renderer must have (approximate) feature parity with the old renderer.

Missing Nice-To-Have Features

These aren't required for a merge, but would be very nice to have.

Discussions To Have Before Merging

Steps to Merge

superdump commented 3 years ago

Here's the PR for bevy_gltf2: https://github.com/bevyengine/bevy/pull/2537

superdump commented 3 years ago

I’ve got MSAA working but it’s hacky because we need one of:

superdump commented 3 years ago

Here's a work-in-progress PR for MSAA that I would appreciate some guidance on: https://github.com/bevyengine/bevy/pull/2541

I was trying to support dynamically switching MSAA sample count at run time, which is why it looks more complicated than it perhaps needs to be. I hit the wall mentioned in my last message when switching between 1 sample and multiple samples or vice versa.

However, I wanted to see if I could at least support switching between 2/4/8, but it turned out my Radeon Pro 5500M only supports 1 sample or 4x MSAA samples, so I couldn't test that.

I thought that as I had done most of the work for dynamically adjusting the MSAA sample count, I'd leave it in the PR and see what feedback I got.

superdump commented 3 years ago

And a small PR to log the adapter and wgpu backend being used at initialisation time. It's useful to see. https://github.com/bevyengine/bevy/pull/2542

superdump commented 3 years ago

I'll do ClearColor as well when there is feedback on MSAA as it's a small change and that touches the same code.

superdump commented 3 years ago

Infinite reverse right-handed PR here: https://github.com/bevyengine/bevy/pull/2543

It has a known bug of all geometry being black when looking along -Z that I am still debugging. Help welcome.

superdump commented 3 years ago
* Update BufferVec to use Queue (like UniformVec does)

What does this mean? I don't see anything about UniformVec that uses a Queue type.

cart commented 3 years ago

Great work @superdump. I'll dig in tomorrow / answer questions. Just a heads up that CI on pipelined-rendering is broken. Almost done fixing it in #2538

superdump commented 3 years ago
* the ability to configure what node slots are available dynamically

I added support for removing nodes, node edges, slot edges, and subgraphs. I've only tested use of the APIs for removal of slot edges but now MSAA can be toggled on the fly and I added that into the msaa_pipelined example - press 'm' to try it out.

aevyrie commented 3 years ago

Is there anything I can do to help move the primitive shapes RFC forward? I'd be happy to contribute towards implementation PRs once you've reviewed it.

cart commented 3 years ago

@aevyrie haha sadly not really (other than continuing to ping me so I don't forget and trying to get more reviews on it). I'll try to get to it in the next day or so as we do want to get the ball rolling on visibility.

aevyrie commented 3 years ago

Sounds good 🙂. I'd like to help with visibility/culling if you don't mind including me in those discussions.

cart commented 3 years ago

I'd like to help with visibility/culling if you don't mind including me in those discussions.

I'd love to include you in those discussions :)

notverymoe commented 3 years ago

Hi there, fresh face, just wanted to volunteer for some of the writing/updating documentation and porting examples - if you're looking for helpers on that

superdump commented 3 years ago

Hi there, fresh face, just wanted to volunteer for some of the writing/updating documentation and porting examples - if you're looking for helpers on that

Just dive in! :) Mention here what you’re working on and go for it.

notverymoe commented 3 years ago

Alright, I've made a small start just going through the changes. Next I'll begin porting the examples, probably a good way to get familiar.

Edit: Converted most of the 3d examples, excl. render_to_texture. Only broken examples so far are those that rely on things already noted as unimplemented.

Edit 2: Bit too eager apparently 😅 As cart pointed out it's probably best to hold off on that. Doco it is then! Things will change as other tasks complete, but I can at least identify sections and rewrite some sections now.

Edit 3: Still here and keen, just had an up-tick in my day job's workload for a bit there

cart commented 3 years ago

Just a heads up that most of the examples wont need porting once we remove the old renderer and replace it with the new one. I don't think we should migrate examples until we've removed the old renderer (which we should wait to do until we have feature parity with the old renderer).

cart commented 3 years ago

@superdump

  • Update BufferVec to use Queue (like UniformVec does)

Oops! I understand why this is confusing: I haven't merged those changes yet. They're available on my custom-shaders branch though: https://github.com/cart/bevy/blob/custom-shaders/pipelined/bevy_render2/src/render_resource/uniform_vec.rs. Sorry for the confusion. I crossed some wires :smile:

kirawi commented 3 years ago

Will this be in 0.6?

cart commented 3 years ago

That is the goal!

Davier commented 3 years ago

I'm starting to work on porting bevy_ui

zicklag commented 3 years ago

Just ran into the need to set the clear color with the new renderer. Would it be helpful if I added a way to do that? Also, would we want to use the same, ClearColor resource strategy as we did with the old renderer?

Edit: Sorry, just saw the TODO in the list:

Clear Color resource

I'll take that! :smile:

zicklag commented 3 years ago

I'll also work on "SubApp ZST labels instead of integers", but I could use some guidance on where to put the implementation: https://github.com/bevyengine/bevy/discussions/2629.

Weibye commented 3 years ago

This should have the label C-Tracking-Issue (?)

cart commented 3 years ago

Just ran into the need to set the clear color with the new renderer. Would it be helpful if I added a way to do that? Also, would we want to use the same, ClearColor resource strategy as we did with the old renderer? I'll take that! smile

Cool cool! Yeah I think we should use the "current" clear color approach. Eventually for "environmental" things I think we might need something that plays nicer with scenes, but for now I think we should go with the simple solution that works.

I'll also work on "SubApp ZST labels instead of integers", but I could use some guidance on where to put the implementation: #2629.

Sounds good!

superdump commented 3 years ago

MSAA PR updated: https://github.com/bevyengine/bevy/pull/2541 Added support for configurable shadow map sizes here: https://github.com/bevyengine/bevy/pull/2700

superdump commented 3 years ago

Discussed on Discord a couple of weeks ago, moving the default near plane closer, from 1 unit to 0.1 units. I feel like it will be very common to consider bevy units as being real world metres, and not being able to see anything that is closer than 1m feels weird to me. :) https://github.com/bevyengine/bevy/pull/2703

superdump commented 3 years ago

Also as discussed on Discord, there was some confusion about the units for the PointLight intensity member. I dug into the code and figured it out that practically on main and before the PR it is 'luminous intensity' in lumens per steradian, which is a bit of a weird unit to figure out. Instead the Filament document had intended that it be 'luminous power' in lumens, which is what household light bulbs are rated in at the point of sale these days. Looking into pbr.wgsl there was some confusion about where a factor of 4 pi had gone - it was hidden among some other formulae but stated as luminous intensity for a point light = luminous power / 4 pi. So, I made a PR to fix this, added in the division by 4 pi, and documented the intensity value as being in units of lumens, including a table from wikipedia mapping various power ratings of household light bulbs of different types to lumen values as an aid. https://github.com/bevyengine/bevy/pull/2704

superdump commented 3 years ago

Should we add transparency to the to do list? I suppose as it seems to be a bit broken in 0.5, it is perhaps ok that it be broken in 0.6 as well. But otherwise I guess we would want to look into doing an opaque pass and a transparent pass and possibly implementing weighted order-independent transparency for the transparent pass.

superdump commented 3 years ago

Infinite reverse right-handed projection PR is now updated and fixed! \o/ https://github.com/bevyengine/bevy/pull/2543

superdump commented 3 years ago

Can we add a link to https://github.com/bevyengine/bevy/pulls?q=is%3Apr+is%3Aopen+base%3Apipelined-rendering to the original post? It lists the PRs against the pipelined-rendering branch of this repository.

cart commented 3 years ago

So, I made a PR to fix this, added in the division by 4 pi, and documented the intensity value as being in units of lumens,

Yup just read the section of the Filament doc you linked to. The fix in your PR looks good to me (just merged it).

Should we add transparency to the to do list? I suppose as it seems to be a bit broken in 0.5, it is perhaps ok that it be broken in 0.6 as well. But otherwise I guess we would want to look into doing an opaque pass and a transparent pass and possibly implementing weighted order-independent transparency for the transparent pass.

I agree that ultimately we should try to do order-independent transparency, but I don't want to hold back 0.6 for it. If we can get it in before release ... awesome. But I think depth-sorting is a reasonable approach in the interim. We've gotta start pushing 0.6 out the door or it will never happen :)

Infinite reverse right-handed projection PR is now updated and fixed!

Brilliant! I'll take a look.

Can we add a link to https://github.com/bevyengine/bevy/pulls?q=is%3Apr+is%3Aopen+base%3Apipelined-rendering to the original post? It lists the PRs against the pipelined-rendering branch of this repository.

Good call!

superdump commented 3 years ago

I made a PR to add support for marking meshes should not cast or should not receive shadows: https://github.com/bevyengine/bevy/pull/2726 Also one to use a separate vertex shader for shadow passes as they only need to calculate the clip position, not the world position, UVs, nor normal as the pbr.wgsl needs. This then allows reordering the bind groups of the pbr shader to be as we would like: view, material, mesh. https://github.com/bevyengine/bevy/pull/2727

superdump commented 3 years ago

I naively hacked in support for glTF tangents and normal maps. Basically using the same code as we have in the current renderer, and doing whatever was needed to handle the additional vertex attribute, and so a separate pipeline layout and pipeline for when a normal map is present. But it works. And I commented out the loading of the tangent vertex attribute and normal map texture in the glTF loader and it uses the regular no normal map pipeline. I'll clean it up soon and make a PR. It's not going to be so pretty, but it's functional.

superdump commented 3 years ago

Here's a first shot at a PR for tangent vertex attributes and normal maps, including loading them from glTF models: https://github.com/bevyengine/bevy/pull/2741

cart commented 3 years ago

Just updated the task list to better represent the current state of things.

superdump commented 3 years ago

I've implemented a good chunk of stuff for the 'over' operator alpha blending for proper opaque and transparency, as well as properly supporting alpha masking with a cutoff, according to the glTF specs. However, to finish it off fully we need support for the double-sided property which when true means back face culling is disabled. This needs pipeline specialisation. As part of this work, it also made sense to do a depth pre-pass because alpha masking needs a special pass for depth which has a fragment shader and samples the base colour texture to identify if the fragment is opaque or not. So I did that too.

I've also been working on visibility, or at least frustum culling. I have the basic culling working but I need to put some more thought into how to handle the lists of visible entities on views neatly and efficiently. I'll need to finish this off on top of @cart 's custom-shaders branch as that moves to storing dynamic uniform offsets on entities, which is kind of necessary for flexibility reasons (everything just iterates over an enumerated Vec of extracted meshes and the dynamic uniform offsets are tied to this enumeration so it's a bit rigid for now.)

superdump commented 3 years ago

Visibility culling has progressed nicely today on top of custom-shaders now known as modular-rendering. I’ve got camera and light frustum culling working in simulation world systems, I’ve copied over RenderLayers and VisibleEntities, as well as Visible renamed to Visibility and a similar ComputedVisibility component.

Visibility is used for user control of whether an entity is visible or not. The culling system checks that and the render layers and then does frustum culling, which if the entity is not culled leads to the entity being added to that view’s visible entities as well as the entity’s computed visibility being set to true.

Then when extracting, we only extract entities that have computed visibility true as that means they are visible to one or more frusta. Visible entities are also extracted onto the view entity. Visible entities are iterated over for queuing for draws.

The NotShadowCaster component is taken into account so only casters inside the frustum are queued for shadow passes. And I did a bit of optimisation and cut down the frustum vs obb intersection test by 50%.

I think it’s close to a PR once modular-rendering is merged. :)

Future work would be to tackle calculation of the directional light shadow region and fit the frustum to it. Then implement cascade shadow maps. :)

superdump commented 3 years ago

Previously back when I added the tracing spans to wrap stages and systems for profiling purposes, I had tried tracing-flame which produces flamegraphs (good in theory for an aggregate view of how much time is spent in what systems, but in practice it is only proportional and not some statistical but still absolute units representation so you only see that a system took 10% not that it took N ms, which I don’t find very useful.) tracing-chrome worked well for a timeline type instrumentation. The Tracy live-visualisation application crashed a lot on macOS and I didn’t manage to fix whatever was wrong. However, since then it seems that Tracy now works quite well on macOS from some testing today and it can show a live-updating timeline, mean system execution durations and more. And you can save the captures for later offline inspection or use a headless capture tool instead of the live visualisation tool. I think I’ll make a PR against main to add it as a profiling option because it’s really good!

image

cart commented 3 years ago

Just put out #2831, which significantly improves the modularity / ergonomics of render logic using "ECS driven rendering".

superdump commented 3 years ago

Continuing on from visibility culling, and prompted a bit by someone who was trying to render many point lights in a scene, I’ve implemented storing our point lights in a storage buffer. This touched on the need for something like UniformVec but with a BufferUsage of STORAGE instead of UNIFORM. The short version is that I’ll probably rework UniformVec at least, and at most BufferVec (which is one of the bullet points above), UniformVec and DynamicUniformVec depending on what we land on after a bit of discussion. I think I still need to remove the upper limit to the number of point lights in some places though. After I’ve resolved that, I’ll implement culling of point lights against the view frustum which will allow for many more lights to be in a larger scene though performance will be impacted if they are too densely positioned so that too many fit in the view frustum. I think this also steps toward clustered forward rendering though I need to read more about how that works exactly.

Oh, and I know I’ve been saying I’ve been working on lots of things. Don’t worry if you’re concerned about not seeing PRs or so. I’ll make a PR for visibility once @cart’s modular rendering PR is merged as it was much easier to implement on top of that. After that I’ll rework the transparency / opaque / depth prepass stuff on top. And while I wait for pipeline specialisation (for custom vertex attributes like is needed for tangents for normal maps, and custom pipeline configuration like for double-sided materials and MSAA) , I’ll work on the shadow filtering, light culling, automatic fitting of the directional light orthographic projection, etc. Continuing on the culling path seems to lead toward clustered forward rendering, and cascade shadow maps. Continuing on the shadow filtering, I need to fix Percentage Closer Soft Shadows which was not too far from being functional. And then I need to update and fix Screen Space Ambient Occlusion, and my physical sky plugin, perhaps extending it to render to a cube map which leads into image-based lighting for environment maps and such. Lots of fun stuff ahead! :) And of course if others get there first, that’s great! I’m just continuing on because it’s so damn fun! :D

zicklag commented 3 years ago

I'm not sure if this is something to think about yet, but it may be worth noting that WebGL2 won't support using storage buffers in the shaders. WGPU will have a way to detect whether or not storage buffers are supported or not and allow our code to make different decisions based on that, but the renderer would still have to have different shader entrypoints and rendering paths.

cart commented 3 years ago

Hmm that is absolutely something we'll need to think about. I'd like the "bevy defaults" to work everywhere (including WebGL2). We have been discussing a move to storage buffers for our lighting shaders and I honestly forgot that SSBOs weren't an option on WebGL2. If WebGL2 can't support that, we will need to start thinking of ways to support that without hamstringing the rest of the renderer.

superdump commented 3 years ago

Here's the PR for updating BufferVec to use RenderQueue like UniformVec: #2847

Also, I've been reading about clustered forward rendering and I understand in more detail how it works now. As a brief introduction: clustered forward rendering splits the view frustum into subfrusta (a.k.a. clusters or froxels), and makes a per cluster list of point lights that are relevant to that cluster. This allows fragments falling into a cluster to only render the lights that have any significant contribution to the shading of the fragment, rather than using all lights in the scene. This in turn allows many more lights in the scene in an efficient way.

In the article, they propose having a global array of lights, an array of cluster light index lists which are indices into the global array of lights where groups of (as in consecutive runs of, though the groups are not identifiable with only this one array, keep reading and you'll understand) of indices are those relevant to an individual cluster, and an array containing, per cluster, an index into the cluster light index lists array, and a number of lights affecting that cluster. The question I was then considering was - (how) will this work with the clustered forward algorithm as proposed in the article linked above?

GpuPointLights is currently a total of 116 bytes but I think it should be 128 bytes when aligned. Given a maximum uniform binding size of 16384 bytes, that means we can have a total of 128 point lights in one uniform buffer. That's the global array of lights. Given there are 128 point lights that could be relevant to a cluster, we need log2(128) = 7 bits per index. For convenience we could think of these as u8 packed in fours into u32, which gives us 16384 indices in 4096 u32s. Using a u8 should make the masking and shifting quite nice and fast too. That's the cluster light index lists. The article proposes 16x9x24 clusters in x, y, z, respectively. That is 3456 clusters, and per cluster we need one index into the cluster light index lists array, and a count of the number of lights in the cluster. The index needs to address 16384 values which is 14 bits. The count can be no more than the 128 point lights, which is 7 bits. So we need at most 21 bits, which we can say fits in a u32 for convenience and then within 16384 bytes, we can fit up to 4096 u32 values which is more than the 3456 clusters.

So in (pseudo?) wgsl it could look like this:

[[block]]
struct PointLights {
    data: array<PointLight, 128>;
};

[[block]]
struct ClusterLightIndexLists {
    data: array<u32, 4096>;
};

[[block]]
struct ClusterOffsetsAndLengths {
    data: array<u32, 4096>;
};

[[group(0), binding(6)]]
var point_lights: PointLights;
[[group(0), binding(7)]]
var cluster_light_index_lists: ClusterLightIndexLists;
[[group(0), binding(8)]]
var cluster_offsets_and_lengths: ClusterOffsetsAndLengths;

...I uh think that instead of just culling lights to the frustum, I may try implementing clustered forward rendering this week using only uniform buffers (for the sake of supporting WebGL2), as the number of clusters and lights feels quite reasonable.

Possible stumbling blocks could be the performance of actually testing all the lights against all the clusters to generate the index lists, though it's highly parallelisable work, and the gathering of the results in the end is cheap. If that proves to be too slow, a different cluster configuration (the 16x9x24) could be tested.

When we get a shader preprocessor we could support using storage buffers for much larger and dynamic array sizes, and compute shaders for much better performance as well as the ability to leverage the depth prepass (that will come in the transparency PR as previously mentioned) to identify the active clusters (basically look at the depths in the depth buffer for each fragment to identify which cluster they fall into) though that would have to also support transparent meshes... they would at least have to be in clusters closer to the camera.

superdump commented 3 years ago

I've rebased my depth prepass / alpha modes branch on top of my visibility culling branch. The Amazon Lumberyard Bistro scene (which hacked in lights and broken transparency) before: Screenshot 2021-09-16 at 14 19 18 and after: Screenshot 2021-09-22 at 09 15 05

I also made a PR to fix the panics when resizing the window or when trying to go fullscreen: https://github.com/bevyengine/bevy/pull/2858

superdump commented 3 years ago

As #2831 is now merged, I have opened a pull request for frustum culling: https://github.com/bevyengine/bevy/pull/2861

cart commented 3 years ago

Just merged the modular rendering pr. I also updated the checklist in the OP. I'll be working on "pipeline specialization" and "making shaders assets again" next. I'm also reviewing the various pipelined-rendering PRs that others have submitted.

superdump commented 3 years ago

I just got clustered-forward rendering working on macOS. 16x9x24 clusters. No shadow mapping. 128 lights. Green tint is no/few lights affecting the cluster, red tint is 16+ lights affecting the cluster, with a smoothstep between. ~55 fps on a MacBook Pro 16 with i9 9980HK and Radeon Pro 5500M (~mobile GTX 1050 Ti). This is only using uniform buffers and no compute so should work fine on WebGL2. :) https://user-images.githubusercontent.com/302146/135138603-12bcd1ef-b636-453b-84ad-95b00d76aed6.mp4

EDIT: I just noticed it's buggy. Some of the lights are not lit. :D I'll fix it...

superdump commented 3 years ago

Here’s the fixed version by the way: https://twitter.com/swainrob/status/1443263405457133580?s=21 Still ironing out some more basic bugs that didn’t have visual impact. Then I’ll do a little profiling on it. Maybe it’s worth optimising the light allocation either with not naively looping over all clusters vs all lights, but iterating only the clusters that a light would affect given its position and radius, or even possibly leveraging a Bounding Volume Hierarchy of the lights and casting the cluster AABBs against it. I haven’t seen what takes time yet.

Note as well that shadow mapping falls over fast with all the culling of meshes to lights, and 6 passes per point light. We’ve discussed a few ideas how to deal with that like only updating a fixed number of shadow maps per frame, and scoring the maps according to things like their distance from the camera (further away is lower priority), and frames since last update (older is higher priority.) I thought that one would want to first identify the lights that had moved and the lights that had geometry within their range move (i.e. either the light or occluders in the light’s range moved) but that is more complicated and difficult to implement with good performance.

However, at this point I don’t want to go too deep on any one thing because there are too many low-hanging fruit so I think I’ll revisit the other path I mentioned - automatically fitting the directional light orthographic projection to the view frustum and relevant geometry and maybe then implement cascade shadow maps so shadows in outdoor scenes are decent. :)

superdump commented 3 years ago

I did some more on clustered-forward and now support 256 lights using only uniforms as well as doing a basic optimisation to only compare lights against clusters they could affect based on the light position and radius. That seems to make the assignment of lights to clusters take very little time for reasonable light ranges.

See a video of it in action here: https://youtu.be/uAegfg11NdM

I want to implement some light falloff range threshold to reduce the light range to something where it will have significant contribution so we don’t set large ranges and hurt performance for no visible difference. I also tried to get it running on wasm with webgl but nothing displayed. Not sure why.

I need to figure out what to do about shadow maps as the number of point lights grow as they kill performance for now. That is non-trivial though from what I’ve found so far so maybe only the closest / brightest n point lights with shadow mapping enabled get shadows for now. I don’t know. Ideas and suggestions welcome.

I’ve got basic cascaded shadow mapping to run. Now cleaning it up and fixing things with it. I feel like revisiting all the shadow filtering I did may be next in line unless cart merges the specialization branch as then I’ll want to get MSAA and normal mapping in.

My current branch chain has frustum culling, alpha modes and depth prepass, toggling of shadow mapping per light, clustered forward rendering, and now cascaded shadow maps. My PCF, PCSS, and SSAO code will get updated, fixed, and finished off too. :)