A very much Work In Progress renderer that wraps the D3D12 API.
The primary goal is to provide example implementations of various computer graphics techniques such as clustered shading, PBR shading, etc. and the secondary goal is to do it efficiently using a simple, thin abstraction around D3D12.
It's an extension of my other repo, BAE, but here I'm implementing the rendering abstraction myself rather than using BGFX (which is great, just wanted to learn more about building lower level abstractions).
The project uses premake to generate project files, but I've only tested support for creating a Visual Studio 2019 solution. To create the solution, open up Powershell and run:
$ make generate
This will create the solutions in the .build
folder.
Annoyingly, the way that premake adds the nuget package for the WindowsPixRuntime is a bit broken, so you'll likely get errors telling you that the package is missing. It uses build\native\WinPixEvent
instead of the actual path which is build\WinPixEvent
. If you're using MYSYS or another terminal that supports sed
then you can just run the following after the generation step:
$ make fix
Alternatively, just run make
to both generate and fix at once (if you can use sed
).
For now, I'm including all the non-nuget dependencies used inside external
and their licenses are included in each folder ofexternal/include
. Here's a complete list:
boost_context
which FibreTaskingLib needsNot all of these are being fully utilized -- for instance, FTL was used for some experiments but most of the code is not multi-threaded (yet).
Currently, the following implementations have been created, in part as a way of dogfooding my rendering abstraction:
This is largely a port of the work I previously documented in a blog post that used BGFX. The implementation is based off Karis 2013 and uses compute shaders to produce the pre-filtered irradiance and radiance cube maps documented in that paper.
Additionally, it features corrections for multiple scattering based on Fdez-Agüera 2019 that improve conservation of energy for metals at higher roughness values.
Most of the details are in the two blog posts (part 1, and part 2) but to summarize the view culling transforms model-space AABBs to view space and performs a check using the separating axis theorem against the view frustum to determine whether the object is in view or not. ISPC is used to vectorize the system and test 8 AABBs at once.
For 10,000 AABBs, the system can produce a visibility list in about 0.3ms using a single core on my i5 6600k.
Here's a graphic of how this reduces wasted vertex work on the GPU as well, as without culling the GPU ends up spending time executing vertex shader work without any corresponding fragment/pixel shader output:
With culling, the wasted work is reduced greatly:
Largely based off Olsson et al, 2012, this implementation uses compute shaders to populate a buffer containing per-cluster index lists, which are then used to index into the global light lists.
The current implementation supports binning of both point lights and spot lights using clusters defined as AABBs in view-space. The intersection test used for the spot lights is based of Bart Wronski's excellent blog post, which works great with Olsson's view-space z-partitioning scheme which attempts to maintain clusters that are as cubical as possible.
One small change is that I actually implemented at piece-wise partitioning scheme such that the clusters between the near-plane and some user defined "mid-plane" use a linear partitioning scheme, while those beyond the mid-plane use the Olsson partitioning scheme. This was done due to the fact that the Olsson scheme produces very small clusters close to the camera, and using a linear scheme (while providing a worse fit for the bounding sphere tests) provides about a 30% memory reduction.
One thing to note is that I used 32-byte uint
s to store light indices, but this is total overkill for the number of lights I reasonably expect to support and I could easily get away with 16-bits for the index list entries BUT my laptop's 960m lacks the feature support for 16-bit arithmetic 😢
zec::gfx
The D3D12 abstraction, as mentioned, is pretty thin and is still WIP. It's also a bit opinionated in that I haven't made any effort to support traditional resource binding, opting instead to use the "bindless" model. This makes things much simpler but, for instance, the lib doesn't allow you to create non-shader visible CBV_SRV_UAV heaps (since there's simply no need, we just bind the entire heap).
While it currently only supports D3D12, care has been taken such that the D3D API objects do not "leak" and instead the API hands out opaque handles to clients which can then be used to record commands.
Future plans include:
It also supports using a "Render Pass" abstraction (but it's not necessary) that will manage pass resource creation and transitions between passes. For instance, we might have a depth pass that "outputs" a depth buffer, which other passes mark as an "input" that will be sampled from pixel or fragment shaders -- the Render pass system insert a resource transition at the beginning of the first pass that consumes the input "automatically".
Additionally, passes can submit work on the asynchronous compute queue and the abstraction will submit that work separately and even insert cross queue barriers when subsequent passes use the outputs. However, I haven't robustly tested this part of the system yet and I know that it currently breaks when the resources are transitioned using a command list created from the compute queue.
Future plans include:
The projects in this folder are currently broken, as I've changed the zec::gfx
and zec::render_pass_system
APIs since writing them. I'm not sure whether I'll bother fixing them since they were mostly written as proofs of concept and meant to be throw away code.