Glossary

BxDF = Either a BRDF or BSDF

LTC = Linearly Transformed Cosines

The implementation of BxDFs in Nabla

For the GLSL source you can browse here: https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/builtin/glsl/bxdf

and for the in-progress HLSL here: https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/builtin/hlsl/bxdf

Unfortunately the DXC integration (#433, #437, #442) is not operational yet.

General Case

Because the rendering equation can be written as

L_{\text{i}}(\mathbf x, \omega_{\text{i}}, \lambda) = L_{\text{e}}(\mathbf x, \omega_{\text{i}}, \lambda) \ + \int_\Omega f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) L_{\text{i}+1}(\mathbf x, \omega_{\text{i}}, \lambda) |\omega_{\text{o}}\cdot\mathbf n| \operatorname d \omega_{\text{o}}

You have a recursive integral which features two distinct factors.

The Projected BxDF:

f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) |\omega_{\text{o}}\cdot\mathbf n|

and the Incoming Radiance (not to be confused with irradiance):

L_{\text{i}+1}(\mathbf x, \omega_{\text{o}}, \lambda)

Generally speaking its impossible to find a constant time closed form solution to importance sampling the product of Reflectance and Incoming Radiance, except for the simplest of cases (point light, line light and lambertian BRDF). This is because $\omega_{\text{o}}$ appears in both factors, hence they are not independent of each other.

Furthemore the entire incoming radiance is not known (NEE is a special case and does not take into account whether the emitter is actually visible, and Path Guiding samples a much smoother approximation of the incoming radiance).

The logical choice is to use MIS or RIS to sample this product of distributions efficiently and split the techniques into Projected BxDF and Incoming Radiance sample generators, this also helps up to keep our code modular and free of combinatorial explosion of specialized code for each light source type and BxDF combination.

This is why all BxDFs have the following functions:

value evaluation $f{\text{r}}(\omega{\text{i}}, \omega{\text{o}}, \lambda) |\omega{\text{o}}\cdot\mathbf n|$
importance sampled sample generation $\omega{\text{o}} = g{f{\text{r}}}(\omega{\text{i}}, \lambda, \xi)$
throughput (quotient) computation, $q{f{\text{r}}}(\omega{\text{i}}, \omega{\text{o}}, \lambda) = \frac{f{\text{r}}(\omega{\text{i}}, \omega{\text{o}}, \lambda) |\omega{\text{o}}\cdot\mathbf n|}{p{f{\text{r}}}}$ (pre-factored out and computed optimally)

You will often find that you can importance sample the BxDFs in a way such that the $|\omega_{\text{i}}\cdot\mathbf n|$ factor appears in the PDF of the sample generator and therefore disappears from the throughput/quotient hence bringing it closer to a constant.

For a dumb reason the quotient computing functions are contain `rem_` in the name

I cannot explain the brain aneurysm which caused me to call the value divided by the pdf a remainder, and not a quotient. Sorry for that.

Super Interesting Tangent: Relationship of the Jacobian of the sample generator to its PDF

While you might take that in Monte Carlo Integration of $f$ your sample contribution is $\frac{f(g(\xi))}{p_{g}(g(\xi))}$ as the word of God and not investigate further, it is important to consider the relationship between a trivial change of variables and importance sampling (hint: they're the same thing).

Let us take the original thing we wanted to integrate:

\int f(\omega) \operatorname d \omega

lets now perform a change of variables $\omega = g(\xi)$:

\int f(g(\xi)) |\frac{\operatorname D \omega}{\operatorname D \xi}| \operatorname d \xi

since we expect applying Monte Carlo importance sampling as defined above must yield the same answer as not importance sampling a simple change of variables, we have:

p_g = |\frac{\operatorname D \omega}{\operatorname D \xi}|^{-1}

or as given in the Practical Product Importance Sampling via Composing Warps paper:

p_g |\frac{\operatorname D \omega}{\operatorname D \xi}| = 1

There is an important caveat, for the above trick to work, the sample generation function must be a bijection. If more than one subdomain maps to the same subimage you start needing to add the probability densities (so adding the Jacobian determinants like you'd add the resistances of resistors connected in parallel) together to get the real one.

Therefore it's easy to validate if your importance sampling generator implementation/scheme matches the PDF you think it has, here's an observation and derivation with intuitive density-based discussion I made, waaay before I read the Practical Product Importance Sampling paper.

"Smooth" Diffuse BxDF

For the strandard Lambertian BRDF we have:

f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \begin{cases}
    \frac{\rho(\mathbf x,\lambda)}{\pi} & \text{if } \omega_{\text{o}} \text{ in upper hemisphere} \\ % & is your "\tab"-like command (it's a tab alignment character)
    0 & \text{otherwise.}
\end{cases}

and for the BSDF we have:

f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \frac{\rho(\mathbf x,\lambda)}{2 \pi}

To remain sane we pretend that $\mathbf x$ does not depend on $\omega_{\text{o}}$ and therefore $\rho$ is constant.

If we importance sample by generating uniformly distributed points $(r,\theta) \times [0,1]x[0,2\pi]$ on a disk perpendicular to $\mathbf n$ and then unproject them onto the upper hemisphere we get the following PDF

p_{f_{\text{r}}} =\frac{|\omega_{\text{o}}\cdot\mathbf n|}{\pi}

but if we then randomly make half of the points go onto the lower hemisphere for a BSDF, we get

p_{f_{\text{r}}} =\frac{|\omega_{\text{o}}\cdot\mathbf n|}{2\pi}

This is really nice because the throughput/quotient works out to just $q{f{\text{r}}} = \rho$.

"Rough" Oren-Nayar BxDF

We use Oren-Nayar for rough-diffuse, as its important that our rough BxDFs degenerate nicely to smooth versions as $\alpha \to 0$.

For importance sampling we figured it doesn't matter much and just used the same generator as the Lambertian Diffuse, we could of course, improve that.

Cook-Torrance BxDFs

Equations

Assuming a spatially varying roughness $\alpha(\mathbf x)$ and index of refraction $\eta(\mathbf x)$, the BRDF has the following form:

f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda)  = \frac{D(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x)) F(\omega_{\text{i}}, \omega_{\text{o}},\eta(\mathbf x)) G(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x))}{4 |\omega_{\text{i}}\cdot\mathbf n| |\omega_{\text{o}}\cdot\mathbf n|}

note the absolute value of the dot products to handle an Internal Reflections for the dielectric case.

Also the $\eta$ needs to be properly oriented w.r.t. $\omega_{\text{i}}$, whenever the latter is in the lower hemisphere we need to use $\frac{1}{/eta}$.

While the BTDF has the form:

f_{\text{t}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda)  = D(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x)) (1-F(\omega_{\text{i}}, \omega_{\text{o}},\eta(\mathbf x))) G(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x)) \frac{|\omega_{\text{i}}\cdot\omega_{\text{m}}| |\omega_{\text{o}}\cdot\omega_{\text{m}}|}{|\omega_{\text{i}}\cdot\mathbf n| |\omega_{\text{o}}\cdot\mathbf n| (\omega_{\text{i}}\cdot\omega_{\text{m}} + \eta\omega_{\text{o}}\cdot\omega_{\text{m}})^2}

where $\omega{\text{m}}$ is the microsurface normal, which is fixed by (and can be worked out from) $\omega{\text{i}}$, $\omega_{\text{o}}$, and $\eta$.

Upon first glance you might think that this BTDF violates the law of reciprocity $f{\text{t}}(\omega{\text{i}}, \omega{\text{o}}) = f{\text{t}}(\omega{\text{o}}, \omega{\text{i}})$ needed for PBR. However note that if you change the directions in the tranmissive case, the $\eta$ changes too, becoming its own reciprocal.

For certain transmissive configurations of $\omega{\text{i}}$, $\omega{\text{o}}$, and $\eta$ the refractive path will have a zero throughput (and therefore we must report a zero pdf) because either:

$D = 0$ (the microfacet normal is outside of the distribution, this happens for example if the microfacet would have needed to be facing into the macrosurface in order to produce a refractive path like this)
$F=1$ (Total Internal Reflection preventing any refraction)
$\omega{\text{i}}\cdot\mathbf n$ and $\omega{\text{o}}\cdot\mathbf n$ have the same sign (not a refractive path, duh)

Conclusions

Most of these I've covered and derived in the our recorded Discord call, which you have access to on Google Drive.

$|\omega_{\text{o}}\cdot\mathbf n|$ can be factored out

It should be immediately apparent that when $f$ is a Cook Torrance BRDF or BTDF, the following expression

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) |\omega_{\text{o}}\cdot\mathbf n|

contains $|\omega_{\text{o}}\cdot\mathbf n|$ in both the numerator and denominator, which can be cancelled out to avoid a lot of headaches.

Sampling the Distribution of Visible Normals (VNDF) will remove almost every factor from the quotient

There are two variants of the Masking and Shadowing function $G$:

[Less PBR] Height Uncorrellated $G(\omega{\text{i}}, \omega{\text{o}},\alpha) = G1(\omega{\text{i}},\alpha) G1(\omega{\text{o}},\alpha)$
[More PBR] Height Correllated $G(\omega{\text{i}}, \omega{\text{o}},\alpha) = G2(\omega{\text{i}}, \omega_{\text{o}},\alpha)$

As derived by Heitz in his 2014 and 2016 papers, either variant of the masking and shadowing function $G$ for a microfacet cook torrance model is entirely fixed by the distribution of normals $D(\alpha)$, as it defines $\Lambda_{D(\alpha)}(\omega,\alpha)$ which in turn defines:

G_1(\omega,\alpha) = \frac{1}{1+\Lambda_{D}(\omega,\alpha)}
G_2(\omega_{\text{i}},\omega_{\text{o}},\alpha) = \frac{1}{1+\Lambda_{D}(\omega_{\text{i}},\alpha)+\Lambda_{D}(\omega_{\text{o}},\alpha)}

This allows us to treat $D G$ as a single function. Nabla's $D G{\text{GGX}}$ is extremely optimized, we pretty much avoid computing $\Lambda{D{\text{GGX}}}$ because it contains many of the same terms as $D{\text{GGX}}$.

Also as shown by Heitz and others, $D$ by itself is not a probability distribution, $D |\omega_{\text{m}}\cdot\mathbf n|$ is. This is because the projected area of the microfacets onto the macrosurface needs to add up to a constant (which is actually the same as the area of the macrosurface), not the area of the microfacets themselves (a more corrugated surface has a greater surface area).

Note that this formulation makes $\omega{\text{m}}$ independent of $\omega{\text{i}}$.

We can also define the Visible Normal Distribution Function for a given fixed $\omega_{\text{i}}$

D_{\omega_{\text{i}}}(\omega_{\text{m}},\alpha) = \frac{G_1(\omega_{\text{i}},\alpha) D(\omega_{\text{m}},\alpha) |\omega_{\text{i}}\cdot\omega_{\text{m}}|}{|\omega_{\text{i}}\cdot\mathbf n|}

which tells you what propotion of the macrosurface area projected onto the viewing direction (which is why there is a division by $|\omega{\text{i}}\cdot\mathbf n|$) is made up of microfacets with normal equal to $\omega{\text{m}}$ (they also need to be projected onto the viewing direction hence the dot product in the numerator).

When importance sampling, the $\omega{\text{i}}$ is a fixed constant, so the best sampling of $\omega{\text{o}}$ you can hope to achieve is done according to $f |\omega{\text{o}} \cdot \mathbf n|$, but realistically you can only sample $\omega{\text{m}}$ according to $D{\omega{\text{i}}}$ and then reflect $\omega_{\text{i}}$ about it.

The rest of the terms in the quotient composed of dot products involving $\omega$ can be factored out by construction

When you reflect $\omega{\text{i}}$ about $\omega{\text{m}}$ a change of variable happens and your PDF is

p_{\omega_{\text{o}}} =\frac{p_{\omega_{\text{m}}}}{4 |\omega_{\text{i}}\cdot\omega_{\text{m}}|}

note how when you sample according to VNDF, this becomes:

p_{\omega_{\text{o}}} = \frac{G_1(\omega_{\text{i}},\alpha) D(\omega_{\text{m}},\alpha)}{4 |\omega_{\text{i}}\cdot\mathbf n|}

A similar thing happens with refraction as its PDF is:

p_{\omega_{\text{o}}} =\frac{p_{\omega_{\text{m}}} |\omega_{\text{o}}\cdot\omega_{\text{m}}|}{(\omega_{\text{i}}\cdot\omega_{\text{m}} + \eta\omega_{\text{o}}\cdot\omega_{\text{m}})^2}

which actually makes it so that $p{\omega{\text{o}}}$ is exactly the same for the reflective and refractive case.

Therefore when you sample a $\omega{\text{m}}$ from a VNDF first, and then generate your $\omega{\text{o}}$ via this reflection or refraction the same factors arise in your generators pdf as the value, so that the throughput simply becomes:

q_{f_{\text{r}}} = F(\omega_{\text{i}},\omega_{\text{o}},\eta) \frac{G_2(\omega_{\text{i}},\omega_{\text{o}},\alpha)}{G_1(\omega_{\text{i}},\alpha)}

for a BRDF, and

q_{f_{\text{t}}} = (1-F(\omega_{\text{i}},\omega_{\text{o}},\eta)) \frac{G_2(\omega_{\text{i}},\omega_{\text{o}},\alpha)}{G_1(\omega_{\text{i}},\alpha)}

for a BTDF.

This should all intuitively make sense when you consider that as $\alpha \to 0$ both $G_1 \to 1$ and $G_2 \to 1$ so you get the exact same thing as an importance sampling an explicitly implemented smooth mirror or glass BSDF.

When you have a White Furnace Test passing BSDF in a Spectral Renderer even the Fresnel disappears

Assuming you're constructing your BSDF by applying a BRDF on reflective paths and BTDF on refractive

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \begin{cases}
    f_{\text{r}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) & \text{if } (\omega_{\text{i}} \cdot \mathbf n)(\omega_{\text{o}} \cdot \mathbf n)>0 \\
    f_{\text{t}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) & \text{otherwise.}
\end{cases}

If, when importance sampling, you choose whether to reflect $\omega{\text{o}}$ or refract it about $\omega{\text{o}}$ based on a probability proportional to $F$, you get a factor of $F$ in the pdf $p_{f}$ when reflecting and $1-F$ when refracting.

This makes it completely drop out of your throughput so that:

q_{f}(\lambda) = \frac{G_2(\omega_{\text{i}},\omega_{\text{o}},\alpha)}{G_1(\omega_{\text{i}},\alpha)}

There are however important caveats:

When $\frac{d \eta}{d \lambda} \neq 0$ in a non-spectral renderer or spectral with Hero Wavelength sampling, you need to weigh the $F$ for different channels together into a single pdf, this in turn does not make your $F$ drop out from $q_{f}$
If you add additional attenuation such complex valued $\eta$ for absorption or ad-hoc reflectivity and transmittance (Mitsuba-style) then your BRDF weight and BTDF weight won't add up to $1$ anymore and its best you do the importance sampling taking all weights into account

Bonus Round: Thindielectric BSDF

This is a great "specialization for known geometry", you basically assume that each surface with this material is in reality two surfaces which are:

locally parallel
a fixed distance apart as you'll see these are important assumptions later on.

Smooth Thindielectric

The BSDF for smooth glass is:

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \begin{cases}
    F(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) \delta_{reflect(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) & \text{if } (\omega_{\text{i}} \cdot \mathbf n)(\omega_{\text{o}} \cdot \mathbf n)>0 \\
    (1-F(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda)) \delta_{refract(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) & \text{otherwise.}
\end{cases}

What should be immediately apparent is that any ray that enters the Thindielectric medium will hit its backside with the exact same angle as it left the front face, this means:

If it exits it will exit at the same angle as it hit the front face, which implies there's no refraction of the things behind the medium
The probability of exit is identical to that of entry, which itself means that the of undergoing TIR is exactly the same as reflecting at the front face of the medium

When you put these facts together you see that the equations for infinite scattering Thindielectric BSDF (infinite TIR) become

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \begin{cases}
    \delta_{reflect(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) (F + (1-F)^{2} \sum_{n=0}^{\infty} F^{2n+1} ) & \text{if } (\omega_{\text{i}} \cdot \mathbf n)(\omega_{\text{o}} \cdot \mathbf n)>0 \\
    \delta_{refract(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) (1-F)^{2} \sum_{n=0}^{\infty} F^{2n} & \text{otherwise.}
\end{cases}

The $(1-F)^{2}$ factor is present on all but the simple single scattering reflection, because you need a transmission event to enter and exit the medium.

It is further possible to incorporate an extinction function (but not a phase function) e.g. simple Beer's Law.

Let us assume that a ray of light orthogonal to the surface will be attenuated by $\tau_{\perp}$ after passing through the medium, then for a ray penetrating the medium at a different angle we have:

\tau(\omega_{\text{o}}) = \tau_{\perp}^{\frac{1}{|\omega_{\text{o}} \cdot \mathbf n|}}

We can compute the following factor to account for all chains of Total Internal Reflection which exit at the same side:

T = (1-F)^{2} \sum_{n=0}^{\infty} (\tau(\omega_{\text{o}}) F(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) )^{2 i}

then our BSDF becomes:

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \begin{cases}
    \delta_{reflect(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) F (1 + F \tau(\omega_{\text{o}}) T) & \text{if } (\omega_{\text{i}} \cdot \mathbf n)(\omega_{\text{o}} \cdot \mathbf n)>0 \\
    \delta_{refract(\omega_{\text{i}},\mathbf n)}(\omega_{\text{o}}) T & \text{otherwise.}
\end{cases}

[Perpetual TODO] Rough Thindielectric

This will need the energy conservation fix for regular Rough Dielectrics (Multiple Scattering Cook Torrance correction).

Whats important is that our implementation of this will reduce to Smooth Thindielectric as $\alpha \to 0$.

All the BxDFs we didn't cover and which might be needed for Ditt (in order of importance)

Velvet

This is the only one I've had requests for so far.

We should do the same one Mitsuba had, or something more up to date if we can find.

Hair/Fur

For carpets.

The BSDF is what gives you rim-lighting and anisotropic gloss. https://andrew-pham.blog/2019/09/13/hair-rendering/

But its the heightfield / volume tracing that gives the fur effect.

Human Skin

For RenderPeople (not requested yet).

True Diffuse

Lambertian (constant irrespective of angle of observation) is completely made up, if you replace it with an actual volume which does uniform scattering and let the mean free path tend to 0, you'll get a different looking material.

Sony Imageworks has a 2017-ish presentation on that.

Specular Anti-Aliasing and Filtering : The missing pieces in Nabla

The cause of Specular Aliasing

Specular aliasing arises from the shape and direction of the peak of $f(\mathbf x, \omega{\text{i}}, \omega{\text{o}}, \lambda) |\omega_{\text{o}} \cdot \mathbf n|$ actually varying over the footprint of a pixel and this can happen in multiple ways:

The value of $f$ varying with $\mathbf x$ and $\mathbf x$ varying in screenspace

This happens when for example, $\alpha$ or $\eta$ is fetched from a texture.

Both $\omega$ parameters varying due to surface curvature or projection

Imagine any of the following:

a very small sphere with a relatively Smooth Cook Torrance distribution, as you zoom out the VNDF will actually be the same as a flat plane's with $\alpha=1$
a very long wall close to the camera position, at such a steep angle that it all fits in a single pixel, the closest point might have a $\omega{\text{i}}$ perpendicular to the surface (zenith of hemisphere) while the farthest $\omega{\text{i}}$ is almost tangent (hemisphere horizon)

There's a paper called Geometrical Specular AA which aims to tell you how much you should "crank up" the $\alpha$ based on the Principal Curvature (which you can compute from screenspace derivatives of normal vectors converted to derivatives w.r.t. surface parametrization) but it assumes a Bekcmann NDF being used.

There's a newer one from 2022 which develops a similar framework for GGX.

Multiple texels from a Normal/Bump/Derivative Map visible in the same pixel footprint,

If you don't use Schussler et. al. 2017, then this makes both of the $\omega$ vary but differently (not analytically or in a way which can be approximated with a locally with a few derivatives) than in the above case.

But if you do use it, then there's a perturbed normal parameter which performs a very similar role as perturbing the actual normal used for determining the tangent space for the $\omega$ parameters.

Super Cool and Relevant Note on Parametrizing Ray Distributions

An infinite ray (line) in 3D is unquely parametrized by 4 not 6 variables.

This is because it doesn't matter how the ray origin is moved along its direction, or if the direction is inverted, you'll get the same ray.

Hence you can parametrize the ray by:

Direction, but you only need 2 variables and the upper hemisphere to consider
Offset in the plane perpendicular to the Direction relative to coordinate space origin as long as you can define a unique (like Frisvad) orthonormal basis for the plane from (2) you're good.

Ideal Solution to Specular Aliasing

The whole problem comes from the fact that we can't accurately approximate the integral of the lighting equation over the footprint of a pixel filter with just a few samples

I(\mathbf u, \lambda) = \int_{\{\mathbf s \in \mathbb{R}^2 | W(\mathbf s)>0\}} W(\mathbf s) L_{0}(raygen_{\mathbf x}(\mathbf u + \mathbf s),raygen_{\omega_{\text{i}}}(\mathbf u + \mathbf s), \lambda) \operatorname d \mathbf s

this is basically the measurement equation, where $W(s)$ is the given ray's contribution to the sampled image pixel value $I(u,\lambda)$.

So let us define a new filtered BxDF over some distribution of incoming ray intersection points and angles $V(\mathbf v)$, this is not the exact same thing as the above, its a split sum approximation (like all the single tap prefiltered Image Based Lighting in recent PBR rasterization based engines):

\tilde{f}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\{\mathbf v \in \mathbb{R}^2 \times \Omega | V(\mathbf v)>0\}} V(\mathbf v) f(\mathbf x + \mathbf r(\mathbf v), \omega_{\text{i}}+\omega(\mathbf v), \omega_{\text{o}}, \lambda) \frac{|\omega_{\text{o}}\cdot\mathbf n(\mathbf r(\mathbf v))|}{|\omega_{\text{o}}\cdot\mathbf n_{0}|} \operatorname d \mathbf v

this basically gives us a weighted average (more accurately, a convolution) of all the BxDF values as they would be evaluated for our "incoming ray mix". It is this weighted average of the BxDF that we subsitute into the lighting equation at each bounce, hence the split sum approximation, now inside the integral there are two separate sums being multiplied together instead of a single one.

For derviative maps or curvature, the normal used for the lighting equation depends on the ray intersection position which itself is a distribution for the split sum.

Practical Solution to Specular Aliasing

Don't end up right back where you started

Assuming you can actually efficiently (not ideally) importance sample $V$, then importance sampling this $\tilde{f}$ monster is actually quite straight forward, you can do it via composing warps:

g_{\tilde{f}}(\mathbf x, \omega_{\text{i}}, \lambda, \xi_0) = g_{f}(\mathbf r(g_{V}(\xi_0)), \omega(g_{V}(\xi_0)), \lambda, \xi_1)

note that you need 2 extra dimensions from the sample.

The problem is still that you've not reduced the dimensionality of the Light Transport integral, its actually increased! And your convergence rate is the same order of magnitude or worse than the original equation involving $W(\mathbf s)$.

Note how the above sampling strategy can produce the same $\omega{\text{o}}$ in multiple ways, making it so that the actual $q{\tilde{f}}$ is impossible to compute, as you neither know the value of $\tilde{f}$ nor $p_{\tilde{f}}$ and nothing is guaranteed to factor out.

Lets re-use our own BxDF as an approximation to $\tilde{f}$

Note that $\tilde{f}$ is but a convolution (or rather correllation) of our original $f$ with $V$ along the position and observation angle dimensions, this is super relevant later for discussing Covariance Tracing.

What rather than the original BxDF model itself (except for a Linearly Transformed Cosine) is a better fit to approximate the filtered BxDF ?

Here we explicitly state model, because obviously we need to fit new parameters.

As a simple example of this approach, lets consider a Diffuse $f$ whereby construction we have:

 \frac{\operatorname d f}{\operatorname d \omega_{\text{i}} } = 0

so we only really depend on the intersection points of $V$ with our surface.

A common approach to approximating the filtered Diffuse BRDF is to filter its input parameter $\rho$, in theory like this:

\tilde{\rho}(\mathbf x) = \int_{\{\mathbf v \in \mathbb{R}^2 | V(v)>0\}} \rho(\mathbf x + \mathbf r(\mathbf v)) \operatorname d \mathbf v

but obviously the set of intersections for rays $\mathbf v$ with $V(\mathbf v)>0$ on the surface we're shading will form a complex not easily integrable domain. So we further take additional assumptions:

We assume the surface is locally planar
We approximate the distribution of intersections on this planar surface as either a rhomboid of constant probability density or an Anisotropic 2D Normal Distribution given by a 2D covariance matrix.
We precompute some data to later allow us to reconstruct approximations of the convolutions of $\rho(x)$ with an arbitrary distribution approximation from (2)

... and I've just explained Anisotropic Texture Filtering (rhomboid) and EWA Filtering (Anisotropic 2D Normal Distribution).

Lets consider different Specular Aliasing Causes in separation

As we've established we have multiple causes of specular aliasing, here's a list with general approaches to solve them.

The first thing we need is an approximation to $V$ but one that does not need to necessarily be importance sampled, however it needs to be:

Transformable into new $V$ by the ray distribution transforms as described in the following mitigation measures
Projectable onto a planar surface so that the pixel footprints for texture filtering can be worked out

Varying occlusion or distinct surfaces being hit over the distribution of rays

There's not much we can do here in terms of filtering when tracing classical Binary Surface Reps in a BVH with a single hero ray.

Covariance Tracing has some ideas, but according to my reading of the paper you'd need voxels for that.

Would be extremely useful for prefiltering shadows (visibility) for the emitters we hit.

Accounting for the distribution of Observer Angles

The solution here is to take the distribution of outgoing rays from previous path vertex and transform that into a distribution of incoming rays for the next vertex (the surface hit by the "hero" ray).

Assumptions and approximations:

all rays in the distribution hit the same surface
the surface is locally planar and the travel distance for all rays in the distribution (ergo new origins after reflection) can be inferred from the projection of each ray in the distribution against this plane

What we need to know:

the outgoing distribution at the previous path vertex $V_o$
hit distance of the "hero" ray
local surface orientation (geometrical normal)

After this step we get a distribution of incoming rays $V_i$ in the plane of the surface, it can be used to filter the parameters of $f$ that come from a texture.

Curvature

The surface is actually curved and we can correct for that given the principal curvatures if we can differentiate the surface's shading normals w.r.t. some surface parametrization (barycentrics are always present).

Assumptions and approximations:

the entire shape of the surface (distribution of normals) within the distribution of ray intersections is represented by the curvature tensor

What we need to know:

the ray distribution $V_i$ around the "hero hit point" given in barycentric coordinates
barycentrics of the hit position
barycentric derivatives at the hit position
shading normals at the vertices

This gives us $V_g$ the incoming distribution corrected for curvature.

Variations in BxDF Parameters sourced from Textures (Albedo, Bump/Normal/Derivative Mapping)

The really cool thing up until now, is that no matter how you represent $V$, all the steps above don't, and shouldn't care about what the BxDF is.

Assumptions:

surface is again considered to be locally planar

What we need to know:

the distribution of rays in the texture (UV) space for texture filtering
the distribution of roughnesses and visible perturbed normals weighted by the above distribution
the BxDF applied $f$

We now get a new set of BxDF parametersand therefore a new BxDF $f_t$, not a new viewing ray distribution.

Sidenote: It would be cool to factor the distribution of viewing directions (transformed into UV space) into the derivation of the distribution of visible perturbed normals as Schussler 2017 makes certain normals more visible than others due to shadowing and masking. In general, without Schussler, you would not be taking the viewing direction into account, because each pixel (bunch of perturbed microfacets) of the normal map has the same area both projected onto the macrosurface and onto any observing ray direction.

This is why the perturbed normal distribution transform should come after curvature, but also because it alters our BxDF parameters which in turn alter how we compute our convolution of outgoing ray distributions by the BxDF.

The Convolution of the Ray Distribution by the BxDF

We now need to derive a yet another modified BxDF $\tilde{f}_v$ from:

$\tilde{f}_t$ the modified BxDF which accounts for the texelspace variations in the original $f$ parameters
$V_g$ the distribution of observer rays corrected for free-travel, projection and curvature

Now, why would we want to do that?

Its to make sure our both our:

importance sampling properly spreads our reflected "hero" rays
any value or quotient evaluation is properly blurred spatially due to both:
1. underlying variations in BxDF parameters in texture space
2. observer direction variations as a result of ray distribution transformations

You should be able to see that we don't suffer from double/redundant correction anywhere, as its the ray distribution transforms that deal with the aliasing arising due to geometry, and the BxDF re-fitting that deals with meso-surfaces and convolution by the original BxDF.

Assumptions:

surface is planar
parameters of $\tilde{f}_t$ are locally constant

What we need to know:

$\tilde{f}_t$ which is just the BxDF parameters already filtered and adjusted for the ray distribution in texture space
the distribution of observer directions corrected for all geometrical effects

Since we assume that the parameters of $\tilde{f}_t$ are locally constant, we don't forward $\mathbf x$ to it, and define:

\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\mathbf r(\mathbf v_{\text{i}})=\mathbf x \land V_g(\mathbf v_{\text{i}})>0} V_g(\mathbf v_{\text{i}}) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf v_{\text{i}}), \omega_{\text{o}}, \lambda)  \operatorname d \mathbf v_i

We then need to represent our outgoing ray distribution $V_o$ properly, which based on previous estimates is:

V_o(\mathbf v_o) = \tilde{f}_v(\mathbf r(\mathbf v_o), \omega_{\text{i}}, \omega(\mathbf v_o), \lambda)

Finally one might want to account for the fact that importance sampling might be drawing multiple samples on the same surface and "narrow down" the distribution proportionally to the sample count as an esitmate of the sampling rate of the outgoing directions. However, one should not take into account the path depth, the loss of sample rate due to ray spread on deeper paths is already taken into account by all the above!

Ray Distribution Representations

We need a way to represent $V$ in a way that lets us perform all or some of the mitigation techniques outlined in the previous comment.

So far I know of only two.

Ray-Differentials and how they suck

The basics of how they work

Ray Differentials start with the derivative of the ray directions between neighbouring samples, if using sample sequence of let's say $n^2$ samples which is also stratified (each cell in a $n \times n$ grid contains one sample), like for example, a (t,m,s)-net then the derivative of the ray at the start of the path is:

\frac{1}{n} \frac{\operatorname D raygen(u,v) }{\operatorname D (u,v)}

the $n$ accounts for sampling density (avergage distance between samples).

The ray-origin derivative is obviously null when using a pin-hole camera as there is a singular focal point for all rays.

This is how you perform the following operations

Approximate a pixel's footprint on an intersected planar Surface

You can simply construct a frustum based on the positional and directional derivatives and intersect this with the surface.

Turn an outgoing distribution into an incoming one

Simply negate the directional derivatives, rays will still travel through the same parametrized origins.

Correct for Curvature

We need a function to express a ray in the shading normal's tangent space, we then use the Chain Rule to derive first order derivatives in that tangent space.

Variations in BxDF Parameters

Find out the footprint in tangent and UV space, filter textures appropriately.

So far I don't know of any method to make the viewing direction play a role, but this would only be needed for Fresnel which rarely ever varies spatially.

Convolution

The only hope of achieving anything here, is to figure out transformations of parameters (usually just $\alpha$) based only on the directional derivatives of the incoming rays in tangent space.

Then when importance sampling, compute the derivative to use as the new directional derivative with an approximation like this:

\frac{1}{\sqrt{n}} \frac{\operatorname D g}{\operatorname D \xi}+ \frac{\operatorname D g}{\operatorname D \omega{\text{i}}}

but in a way that does not cancel out opposing directional derivatives.

This adds the spread thanks to the importance sampling's roughness and the original incoming distribution, its nice because:

for diffuse (view independent) BxDF generators the $\frac{\operatorname D g}{\operatorname D \omega{\text{i}}}$ term disappears
for smooth reflectors and refractors (which can only produce a single direction) the $\frac{1}{\sqrt{n}} \frac{\operatorname D g}{\operatorname D \xi}$ factor disappears
directional derivatives increase with surface roughness
it accounts for the projection of the BxDF

This method also sucks because its completely dependant on your sample generator:

any non-continuous mappings or complex BxDFs don't have nice derivatives that truly represent the directional spread of the sampled rays
if you use naive sampling, you'll have distributions that don't make sense when compared to BxDF intensities
the derivatives of generators w.r.t. $\xi$ are tedious to compute and they aren't even that useful in terms of accuracy

Why they suck: They cannot deal with fuzzy distributions

Ray differentials is that they prescribe a discrete solid (binary) shape of a frustum that the rays occupy.

The problem with that is as you take a bundle of rays within a frustum (can be very thin) and reflect or refract it with a rough BxDF, they will spread over a large range of angles, away from a perfectly reflected frustum and not every angle will have the same intensity!

A Ray Differential assumes all rays within the frustum have the same importance/intensity.

In the end what really happens in renderers is that rather than actually work out the differentials analytically by differentiating the complex materials, the ray gets some metadata to keep track of finite differences in screenspace X and Y:

perturbed ray origin
perturbed ray direction

In Mitsuba and PBRT you essentially get 2 "ghost" rays which are not taken into account when traversing the BVH but are:

Intersected against the same primitive that the main ray intersects
They fetch the BxDF parameters at their different respective intersection positions
Ran through the next ray direction generation procedure with their modified positions and directions (hence getting different BxDF, NEE and Path Guiding importance samples)
The path continuations (be they BxDF/PathGuiding or NEE) use the intersection points from (1) as new origins and the directions from (3)

This is absolutely horrible because:

you're still getting 0 difference in ray directions after touching a Lambertian material
the spread due to reflecting against a distribution of normals is woefully underestimated the last one is particularly concerning, and arises due to the fact that you'll likely be reflecting the "ghost rays" through the same microfacet normal you've importance sampled.

One way to combat both issues above is to also consider a $\Delta \xi_i = \frac{1}{\sqrt{n}}$ in your importance sampling functions, note that this makes your path generation 3x more expensive than it was.

If you take the alternative and account for all aliasing cases separately, we make our importance sampling functions compute their derivatives in $\xi$ while they are generating the "hero" sample. But this is a very fragile approximation which is likely to break for the reasons outlined above.

The ideal would be to fit a frustum to the outgoing direction Projected BxDF value distribution, but the only way I can see how to do that is to use Linearly Transformed Cosines and fit the frustum to that.

Why Covariance Matrices rock

Covariance Tracing deals with 4D (if you ignore time) distributions (spreads) of ray origins and directions, they're modelled as a 4x4 covariance matrix which means fuzzy gaussian distributions.

Every time you do something with the ray distribution, like:

intersect it with a surface farther away
turn a curved surface into a planar one
convolve according to a BxDF

there is a 4x4 matrix to multiply the original distribution with. Its actually less than 4x4, because covariance is symmetrical, so only need to concern ourselves with UL matrix.

Then you just project when you need to get the 2x2 covariance matrix of the ray distribution as it intersects a surface's tangent plane which will be w.r.t surface's UV coordinates.

This gives you an oriented gaussian filter elliptical footprint which is exactly the thing you need for EWA texture minification which is considered superiror to Anisotropic Filtering (which we'll still use).

For Anisotropic Filtering you can just extract the ellipses minor and major axes multipled by some fixed cosntant (however many $\sigma$ we want to fit in a box) and use that for sampling.

Why they are stupidly good

Recall our original convolution over a projected disk:

\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\mathbf r(\mathbf v_{\text{i}})=\mathbf x \land V_g(\mathbf v_{\text{i}})>0} V_g(\mathbf v_{\text{i}}) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf v_{\text{i}}), \omega_{\text{o}}, \lambda)  \operatorname d \mathbf v_i

We can express it as a convolution over a plane perpendicular to the importance sampled outgoing "hero" $\omega_o$ ray, assuming a small tiny spead of the ray distribution:

\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\perp \omega_{\text{o}}} V_g(\mathbf v_{\text{i}}(\mathbf u)) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf u), \omega_{\text{o}}, \lambda) |\frac{\operatorname D \mathbf v_{\text{i}}}{\operatorname D \mathbf u}| \operatorname d \mathbf u

Now the absolute cool thing that Covariance Tracing does, is that it expresses the above in terms of a Fourier Transform.

\mathcal F(\tilde{f}_v) = \mathcal F(V_g)\cdot\mathcal F(\tilde{f}_t |\frac{\operatorname D \mathbf v_{\text{i}}}{\operatorname D \mathbf u}|)

at this point, we should note that all the equations convolved using the incoming direction variable.

Now the funny thing is that this formulation of $\tilde{f}_v$ actually tells us how the light reflected from a constant $\omegao$ varies with $\omega{\text{i}}$, which is not what we're looking for. But since BRDFs are supposed to be symmetric we can swap the $\omega$ parameters to get our answer.

This is where the strategic choice of using covariance as the approximation of $V$ pays dividends, you can go back and forth between primal and frequency space with Gaussian Distributions very easily. Also we can use convolution and correllation interchangeably because the representation of $V$ is zero centered and symmetric.

This means that in order to find a $V_o$ that fits $\tilde{f}v$ in primal space, one does not need to invert_ $\mathcal F(\tilde{f}_v)$. The fit can be performed in frequency space, and it simply involves adding covariance matrices together!

\Sigma_{V_o} = \Sigma_{V_g}+\Sigma_{\tilde{f}_t}

but because the covariance matrix of the BRDF is not invertible in 4D a slightly different formulation is used

\Sigma_{V_o} = (\Sigma_{V_g}^{-1}+\Sigma_{\tilde{f}_t}^{+})^{-1}

Roughness Scaling for Geometrical Specular AA: Directions not clear

The spreading of our Covariance Matrix won't:

make us hit tiny lights with BxDF generated rays
make the BxDF evaluation return higher values for off-peak values when spreads are bigger

Therefore we'd need to deduce how to bump up the anisotropic $\alpha$ given a covariance matrix in the local surface coordinate frame, essentially computing $\tilde{f}_t$.

Sums of BxDFs : Covariance Tracing's Achilles Heel

According to Laurent Belcour (the paper's author) replying to my Twitter questions, basically for a given BRDF or BTDF you compute the Hessian.

Obviously YOU COMPUTE THIS SEPARATELY FOR THE REFLECTION AND REFRACTION cases, you cannot put two vastly separated unimodal distributions (which together form a bimodal) into a single unimodal (which is what a gaussian given by covariance matrix is) and hope to get anything remotely useful. This is obviously a problem for your standard dielectric with has two peaks in the plot of outgoing directions, one for reflection and one for refraction.

So since we cannot approximate multi-modal distributions with a Gaussian derived from a single Hessian, we have a problem defining a single Hessian to use for our ray distribution transformations. We could either:

Just use the Hessian of the BxDF we've decided to importance sample but this unnecessarily might break BxDFs into separate distributions that could be approximated with a single one
Decide there's only one Hessian and compute it for $ft(\mathbf x, \omega{\text{i}}, \omega{\text{o}}, \lambda) |\omega{\text{o}} \cdot \mathbf n|$ then invert.

Whatever the case seems like Durand 2005 is a much needed read first.

Unlikely Conclusion : Another way to refit BxDF parameters

Since both Covariance Tracing and an alternative version of Ray Differentials would both use the Jacobian

\frac{\operatorname D \tilde{f}_t}{\operatorname D \omega_o}

to estimate $V_g$, we could flip the parameter hitting on its head and ask what parameters does $\tilde{f}_v$ need for its Jacobian to stay close to the convolution of the Jacobian of $\tilde{f}_t$ with $V_g$.

Anti-aliasing of Derivative Maps

As one knows, mip-maps of bump-maps, derivative-maps and normal-maps tend to average the values in the footprint to a single value. This produces a smooth surface under minification which is not correct.

One needs to filter the original NDF's $\alpha$ parameter alongside the perturbed normal $\mathbf n_p$ for a given pixel footprint $P$ with a density $W$ to closely match the underlying distribution of normals in the actual normalmap.

Because the input Derivative Map's X and Y channel histograms do not need to be equal (think metal grating), the filtered NDF can be anisotropic even if every perturbed normal had an isotropic NDF applied to it. This is yet another argument why all NDFs should be implemented as anisotropic and not optimized for the isotropic case.

Therefore filtering techniques that spit out covariant roughnesses are best (so that NDF is not only anisotropic, but that the anisotropy can be rotated). So there are $\alpha{\text{x}}$, $\alpha{\text{y}}$ and $\alpha_{\text{x}\text{y}}$. But attention must be paid to the actual representation such that they behave nicely under Trilinear Interpolation.

Because of all of the above, the BxDF (or at least the NDF) intended for use with a Derivative Map needs to be known.

It remains a challenge to formulate our "target" function to minimize, but one should hope its some sort of an error metric between the average BxDF value from the pixel footprint's pixels and the BxDF of the filtered perturbed normal and roughness.

There is a choice of target functions, but:

the arguments we optimize are always the 3 dimensional $\tilde{\alpha}$ and two dimensional $\mathbf{\tilde{n}_p}$
$W$ is basically the same as out mip-mapping kernel, so a convolution of two of any of the following: Hat, Gaussian or Bessel Windowed Sinc

We could of course define our spatially varying distribution beautifully in terms of a convolution, for example:

\int_P W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}) \operatorname d \mathbf x

but texels in an image are sampled, and therefore Dirac Impulse Trains, so by the sampling property the above turns into a sum

\Sigma_{\mathbf x \in P} W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}})

Optimizing most of the following target functions is quite easy, there will always be an integral of Square Error in Solid Angle measure where the parameters we're optimizing are arguments to some distribution.

This can be optimized easily provided we can differentiate the Square Error Integral w.r.t. $\alpha$ and $\mathbf n_p$.

The fact that its an integral doesn't matter we can randomly initialize the parameters and apply Stochastic Descent (basically do what Mitsuba 2 and 3 do).

NDF Matching

\int_\Omega (D(\tilde{\alpha},M(\mathbf{\tilde{n}}) \omega_{\text{m}})-\Sigma_{\mathbf x \in P} W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}))^2 (\omega_{\text{m}}\cdot\mathbf e_2)^2 \operatorname d \omega_{\text{m}}

The idea behind this one is simple, our "Ground Truth" NDF is the weighted sum of all per-texel NDFs of varying roughness rotated by $M$ to have a new tangent space that aligns with $\mathbf n_p$.

Then with the given constraints we try to make our model fit the ground truth to the best of our ability.

NOTE: If we start to use Schussler et al. 2017, then $\mathbf n_p$ becomes a parameter of $D$. There will also be a problem in the above formulation as the NDF always has a tangent specular microfacet which is a delta distribution which will mess up our differentiation.

VNDF Matching

Another insight is that maybe we shouldn't care about the normals we're not going to see anyway.

One would think to attempt to generalize by making $\tilde{\alpha}$ and $\mathbf{\tilde{n}p}$ depend on $\omega{\text{i}}$, however that would probably break the repriprocity and following from that, the energy conservation properties of the resulting BxDF.

So we are left with this:

\int_\Omega \int_\Omega (D_{\omega_{\text{i}}}(\tilde{\alpha},M(\mathbf{\tilde{n}}) \omega_{\text{m}})-\Sigma_{\mathbf x \in P} W(\mathbf x)D_{\omega_{\text{i}}}(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}))^2 \operatorname d \omega_{\text{m}} \operatorname d \omega_{\text{i}}

Whole BxDF Matching

Finally one might approach the problem from the angle of "who cares about the microsurface profile, I care about appearance".

\int_\Omega \int_\Omega \int_\Lambda (f(\tilde{\alpha},\tilde{\mathbf n}_p, \omega_{\text{i}}, \omega_{\text{o}}, \lambda)-\Sigma_{\mathbf x \in P} W(\mathbf x)f(\alpha(\mathbf x),\mathbf n_p(\mathbf x), \omega_{\text{i}}, \omega_{\text{o}}, \lambda))^2 (\mathbf n \cdot \omega_{\text{o}})^2 \operatorname d \lambda \operatorname d \omega_{\text{o}} \operatorname d \omega_{\text{i}}

Obviously to try to apply this to a mixture or blend of BxDFs would be insanity requiring proper differentiable rendering (due to the absolute explosion of parameters to optimize).

But we could make the "Rough Plastic" a special case and allow treating that blend as a singular BxDF, or subexpressions in the BxDF tree which use the same roughness and derivative maps.

Stochastic Descent Optimization Implementation

We re-use our GPU accelerated polyphase filter, which preloads a neighbourhood of $P$ into shared memory.

There are some obvious constraints:

$\mathbf{\tilde{n}_p}$ needs to be within AABB of input footprint normals
$\tilde{\alpha}$ needs to have higher values than the minimum value in the footprint

We then repeat the following process a few times:

Start with a new randomly initialized pixel value with some constraints
perform iterations of stochastic descent until K iterations or error is below a threshold
if better than last best, keep that sample, else go to step (1)

Numerically Stable MIS

Motivation

When integrating the light transport equation at a path vertex:

\int_\Omega f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) L_{\text{i}+1}(\mathbf x, \omega_{\text{o}}, \lambda) |\omega_{\text{o}}\cdot\mathbf n| \operatorname d \omega_{\text{o}}

we can use multiple importance sampling techniques $g{\text{j}}(\mathbf x, \omega{\text{i}}, \xi)$ to generate our $\omega_{\text{o}}$.

Each technique's PDF $p{\text{j}}(\mathbf x, \omega{\text{i}}, \omega_{\text{o}})$ will approximate a different factor or term in the light transport equation, for example:

Projected BxDF $f |\omega_{\text{o}}\cdot\mathbf n|$
Direct Incoming Radiance (a specific term in the $L_{\text{i}+1}$ factor)
The entirety of the Incoming Radiance or Light Field (the whole $L_{\text{i}+1}$ factor)

Note that we omit $\lambda$ from the notation because for a BxDF we don't model photon absorption and re-emission, so all properties need to be satisfied separately for each value of $\lambda$

Each Technique has different strengths and exhibits more or less variance in different places

As per the famous image in the Veach thesis:

BxDF sampling is better in cases where occlusion/visibility plays a large part or when the materials are very smooth, also in the case of indirect light
Next Event Estimation (direct light sampling) is better for directly visible lights with small solid angles and they are either very intense or the materials are rough

Other techniques such as path guiding may do a great job of approximating the Incoming Radiance, but its only an approximate and therefore be inefficient at "hard to sample" light distributions (usually sparse things or accounting for the windowing by the BxDF).

Ideally we'd like to combine them.

However the codomains of the directions produced by the Techniques are not disjoint

This means that accounting for the overlap to be able to use both techniques is necessary.

Simply averaging the results will not decrease Variance, you want to weight them so that a technique contributes more to the final result when it is "better" than the others.

And ideally weigh them smoothly, to have transitions without visual artefacts.

The Idea: Blending contributions with outgoing direction dependant weights

Basically we weigh each contribution $q{\text{j}}(\mathbf x, \omega{\text{i}}, \xi)$ with a weight $w{\text{j}}(\mathbf x, \omega{\text{i}}, \omega_{\text{0}})$ such that all weights sum up to $1$, we are still integrating the same integral. This is because

q_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi) = \frac{f_{\text{j}}(\mathbf x, \omega_{\text{i}},g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi)) L_{\text{i}+1}(\mathbf x, g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi)) |g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi) \cdot\mathbf n|}{p_{\text{j}}(\mathbf x, \omega_{\text{i}},g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi))}

There are various ways of obtaining and defining these weights.

MIS: Make the weights dependant on the PDF

The intuiting is that higher the PDF of generating an outgoing direction with a given Technique the "better" it should be compared to the others.

The Power Heuristic

The definition is simple:

w_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{(n_{\text{j}} p_{\text{j}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}}))^{\beta}}{\Sigma_{\text{k}} (n_{\text{k}} p_{\text{k}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}}))^{\beta}}

where usually $\beta=2$ and $n_{\text{j}}$ is the overall proportion of the samples you're allocating to the generator (its an up-front choice).

However this must be made numerically stable!

Where numerical instability comes from

For now we're ignoring if we could compute $q_j$ in a numerically stable way.

Because we operate in the real world, and all our computations happen using IEEE754, we cannot:

divide 0 by 0 (or very small numbers)
divide INF by INF
add infinities together
do additions on numbers of vastly different magnitutes and expect to be able to retrieve the difference later

When importance sampling tight distributions, the PDFs are very large (infinite for delta distributions), which makes the above definition of $w_{\text{j}}$ not numerically robust.

The Solution: Reformulation

The most important thing to note is, that for every sample $\omega{\text{o}}$ which was generted by $g{\text{j}}$ we have $p{\text{j}}(\omega{\text{o}})>0$

Therefore we can rewrite our weight like so:

w_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{1}{1+\Sigma_{\text{k} \neq \text{j}} (\frac{n_{\text{k}} p_{\text{k}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}})}{n_{\text{j}} p_{\text{j}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}})})^{\beta}}

Lets assume our techinques' PDFs don't contain overlapping delta distributions.

\not \exists \omega_{\text{o}}, \text{j}, \text{k} : p_{\text{j}}(\omega_{\text{o}})=\infty=p_{\text{k}}(\omega_{\text{o}})

Now you can see the weight is extremely stable, as for all values of $\omega_{\text{o}}$ we have the following:

$w_{\text{j}} \leq 1$
$w{\text{j}} \to 1$ as $p{\text{j}}(\omega_{\text{o}}) \to \infty$
$w{\text{j}} \to 0$ as $p{\text{k}}(\omega_{\text{o}}) \to \infty$ for any $\text{k} \neq \text{j}$
$w{\text{j}} \to 1$ as $\forall p{\text{k}}(\omega_{\text{o}}) \to 0$

Furthermore, if we require that our technique's PDF satifies

(\exists K \in \mathbb{R}^{+} ) (\forall \omega_{\text{o}} : p_{\text{j}}(\omega_{\text{o}}) > K f_{\text{j}}(\mathbf x, \omega_{\text{i}},\omega_{\text{o}}) L_{\text{i}+1}(\mathbf x, \omega_{\text{o}}) |\omega_{\text{o}}\cdot\mathbf n|

we are then sure that $\forall \xi : q{\text{j}}(g{\text{j}}(\xi)) \neq \infty$.

The special case of the product distribution

This is when our integrand is a product of one or more functions, and the sampling Techniques have PDFs proportional to one of them.

It happens in the case of BxDF sampling and NEE.

Lets imagine our integrand:

\Pi_j \hat{f}_j(x)

where each factor $\hat{f}_j(x)$ has an importance sampling technique

\hat{g}_j(\xi)

with an associated PDF $p{\hat{g}{j}}(x)$.

And lets have a quotient again:

\hat{q}_j(x) = \frac{\hat{f}_j(x)}{\hat{p}_j(x)}

Then we arrive at reformulating our integrand with MIS:

\Sigma_j w_{\text{j}}(x) \hat{q}_j(x) \Pi_{k \neq j} \hat{f}_k(x)

and everything is extremely stable.

Further topics

There are other ways of deriving the weights.

Don't just use the PDF of the Technique, use the full Target Distribution (part of Resampled Importance Sampling)

Basically you define a distribution which is your "true" target which can be evaluated but can't be sampled, and use that to define the weights.

Budgeting Samples (RIS, Path Guiding and Machine Learned MIS touch upon this)

The assumption of Veach in 1998 was that you know your sample budgets for each generator up-front and you'll always use them (trace rays). However it is also possible to:

adjust the sample budget for each Technique depending on the position and direction of the ray in the scenes
decide between using the sample for generator A or B

Make the weights have values outside the [0,1] range (Krivanek's last paper)

Deriving the weights isn't straight forward, they actually need to be optimized based on feedback, and they make stochastic choice of the techinque difficult.

Importance Sampling BxDF Mixtures

Suppose we are given the following "mixture" of BxDFs

f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = 
\Sigma c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) f_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})

where $f{\text{i}}$ is a function satisfying the properties for an energy preserving BxDF and $c{\text{i}}$ is a weighting function.

Lets assume that each $f{\text{i}}$ we can mix has a decent importance sampling technique $g{f_{\text{i}}}$.

Now let us think about "how do we derive a sampling function $g$ ?" such that:

the quotient $q$ is easy to compute
our variance is as low as possible, meaning that $p_f$ should match $f$ closely
the quotient $q$ is numerically stable

In reality we only have two options for combined sampling

A-priori O(n) : Stochastically pick between sampling strategies

The idea is simple, we stochastically choose the produce the sample with

g_{f_{\text{i}}}(\xi_0,\xi_1)

based on $\xi2$, with probability $p{\text{i}}(\mathbf x, \omega_{\text{i}})$.

We choose the generator BEFORE sampling, so note that $\omega_{\text{o}}$ is not an argument to the generator choice probability function.

Then our combined sampler PDF $p$ becomes

p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = 
\Sigma p_{g_{\text{i}}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) p_{\text{i}}(\mathbf x, \omega_{\text{i}})

Our choice then becomes quite limited, as no formulation of $p_{\text{i}}$ is ideal because the sampling process is flawed. Even though we might choose the generator for the BxDF that will contribute the most energy, our single chosen generator's sample can be suboptimal.

But this becomes more and more acceptable as

\frac{\operatorname D c_{\text{i}}}{\operatorname D \omega_{\text{o}}} \to 0

Either: Split-sum approximation of contribution of each BxDF

We essentially attempt this

p_{\text{i}}(\mathbf x, \omega_{\text{i}}) = \frac{\int_{\Omega} c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}}{\int_{\Omega} \Sigma_j c_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}}

Or: Eliminate any $\omega{\text{i}}$ dependent factors from $c{\text{i}}$ and use that

We could, for example, pretend we have different $\hat{c}{\text{i}} \geq c{\text{i}}$ and use these for definig the probabilities similar to the above.

Posteriori O(n^2) : Resampled Importance Sampling

Here we actually produce all samples with every generator

\omega_{\text{o}_{\text{i}}} = g_{f_{\text{i}}}(\xi_{1+2i},\xi_{2+2i})

TODO: This is wrong, need to consult my RIS/ReSTIR papers for correct weights

~~Then we define our target distribution~~

p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \propto f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})

~~and compute MIS weights $w_{\text{j}}$ pretending that~~

p_{\text{j}} = p(\mathbf x, \omega_{\text{i}}, g_{f_{\text{i}}}(\xi_{1+2i},\xi_{2+2i}))

~~Since (the sane, non Krivanek type) MIS weights sum up to 1, we can use them as a discrete Probability of selecting the sample from generator $\text{i}$ using $\xi_0$ as our random variable~~

p_{\text{i}}(\mathbf x, \omega_{\text{i}}) = w_{\text{j}}

This means that for a mix of $n$ BxDFs, you generate $n$ samples and then evaluate $n^2$ functions!

This is because $f$ is a mixture of $n$ BxDFs, so evaluating $f$ at $n$ different input values, makes $n^$ function evaluations.

Apparently ReSTIR has some non O(n^2) formulation of an approximate of this process

Again, doing it in a numerically robust way

Recall that our combined $q$ is

q(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}

and if we do nothing, we can end up with $\infty$ on both sides of the fraction.

So we do a similar trick as with Numerically Stable MIS, and single out the generator component and divide everything by its probability when the generator is $i$

q = \frac{\frac{c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{p_{\text{i}}(\mathbf x, \omega_{\text{i}})} q_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) + 
\Sigma_{j \neq i} c_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) f_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{1+\Sigma_{j \neq i}p_{g_{\text{j}}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) p_{\text{j}}(\mathbf x, \omega_{\text{i}})}

And everything works as long as you don't try to make a weighted sum of two or more delta distributions with identical centers.

Bonus: Sampling the Rough Plastic distribution

The correct way to formulate a Rough Plastic (not the LearnOpenGL way)

We start with the following BxDFs

f_1(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{D(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x)) G(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x))}{4 |\omega_{\text{i}}\cdot\mathbf n| |\omega_{\text{o}}\cdot\mathbf n|} \\

f_2(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{1}{\pi}

with weights

c_1(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = F(\omega_{\text{i}}, \omega_{\text{o}},\eta(\mathbf x)) \\

c_2(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = (1 - \int_{\Omega} h(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}) T(\rho, \omega_{\text{i}},\eta(\mathbf x))

where $T$ is a boost factor to correct for the fact that there is TIR going on in the diffuse layer (as per the Earl Hammon presentation).

Basically we assume that light which was not reflected specularly enters the surface to reach diffuse layer.

We can take a shortcut and approximate the integral in $c_2$ because there are many choices of what to stick there:

$h = c_1 f_1$, the original specular BRDF. But this wrongly assumes that energy lost due to multiple scattering enters the surface
$h = c_1$, the transmittance for a given microfacet. But this ignores the VNDF
[what we currently use] Fresnel of $\omega{\text{i}}$ for the macrosurface normal $\mathbf n$. This actually isn't that bad, because the tabulation or rational fit is only in $\omega{\text{i}}$ and $\eta$, we don't need to know the exact $D$ and $\alpha$ used.

The Integral of single scattering Transmittance $D (1-F) G_1$ can be tabulated trivially using SageMath/Mathematica.

If you are in the incredibly lucky position of being able to express your $f_1$ with a Linearly Transformed Cosine fit which also corrected for multiple scattering, then you can get a perfect $c_2$ formulation because anything not reflected by an infinitely scattering dielectric BRDF should enter the surface and scatter in the diffuse layer.

Arbitrary Output Value Extraction (Albedo, Shading Normals)

Generally speaking, for diffuse materials you simply output the shading normal and albedo.

But for specular reflectors and transmitters you really want to be able to "see through" and see the albedo and shading normals of the reflected or refracted geometry.

AOV Transport Equation

We basically pretend that AOVs are "special" light, emitted at every path vertex $A_{\text{e}}$ by every diffuse or rough surface

A_{\text{i}}(\mathbf x, \omega_{\text{i}}) = \int_\Omega (t(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) (A_{\text{i}+1}(\mathbf x, \omega_{\text{i}}, \lambda)-A_{\text{e}}(\mathbf x)) + A_{\text{e}}(\mathbf x)) |\omega_{\text{o}}\cdot\mathbf n| \operatorname d \omega_{\text{o}}

where $t$ is a Bidirectional AOV Transport Function which we derive from a BxDF.

Definiting the BATF for BxDFs

Single BxDFs

Diffuse

Regardless of the roughness, $t = 0$.

Cook Torrance

There is no difference between a transmitter or a reflector.

We want to make $t \to f$ as $max_i \alpha_i -> 0$, the best construction is

t = s(\alpha) f

And we want to compute $s$ as fast as possible, using leftovers from other computations so we can use:

a simple polynomial or exponential function of $\alpha$
some metric of the VNDF's spread

Either becomes a simple function of $\alpha$ as you do not want the transmittance of AoV to depend on light directions for any other reason than Fresnel which is already accounted for.

Subsurface Scattering

Generalle speaking, $t \to 0$ as attenuation due to extinction eradicates translucency and as the in-scattering phase function blurs the background.

BxDF Weighted Sums

We define our $t$ similarly to how we've defined our weighted sum BxDF $f$, using the exact same weights $c_i$.

t(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{\Sigma_i c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) t_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{\Sigma_i c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}

However, we ensure the weights sum up to $1$, because you don't want less intense albedos or foreshortened normals just because of absorption.

Special Treatment for Absorption Weights?

There is an open question of whether we want to include things like the diffuse $\rho$ i theo $c_i$ used for defining $t$.

On the one hand there is the option of making the contribution of each BxDF proportional to its weight, on the other to make it proportional to the contribution measured in terms of luminosity.

Dealing with RGB

This depends on the AOV:

Albedo: It makes sense to blend with separate R G B weights
Normals: Scalar weighting, you don't want that to influence the direction
Velocity: Scalar, for the same reason as Normals

When you weight by a scalar, it should be done after all the transport is accounted for. You don't want a split sum approximation of R G B screening at every vertex, just the emissive one.

Integrating the AOV Transport

The AOV signal is really low frequency, its similar to rendering scenes with a lot of very diffuse ambient light, notice there's no directional component in the "emitted" light.

This means good BxDF importance sampling is the perfect approach.

However we are not at liberty to run a separate path tracing simulation for this.

Since the spp required to get a converged AOV image are much lower than the real render, and even slightly noisy AOVs are usable, we get our integral from left-overs in the original Path Tracing computation.

We simply reuse the same samples as for the Path Tracing!

So we simply divide our overall $t$ by the sampling techinque's $p_j$ and weigh it by the MIS/RIS weight $w_j$ if there is one.

Special Handling of Cook-Torrance

Whenever the sample was generated by a particular Cook Torrance BxDF generator $g_i$ we make sure to use the following formulation:

\frac{t}{p} = \frac{1}{\Sigma_j c_j} ( s_i(\alpha) c_i q_i + \frac{\Sigma_{j \neq i} c_j t_j}{p} )

This is because $f$ pops up in the $t$ to make sure $t$ integrates to $s_i(\alpha)$, dictated by our assumption that the implementation of Cook Torrance passes WFT (which is a stretch).

Bonus Round: The Velocity AOV

Velocity probably shouldn't follow the regular AOV transport, because a non zero time derivative of any Light Tranport Equation term at any path vertex induces screenspace motion.

We should probably find a way to hack Covariance Tracing or Differentiable Rendering into providing us the real screenspace motion derivatives.

How the old compiler worked and new compiler will work

Frontend

A frontend parses a Material Representation specific to an Interchange Format to produce a format-independent Intermediate Representation/Language of the Material.

For now only the following frontends will be developed/maintained:

Mitsuba XML (without phase functions and volumetrics, for now)
glTF (but only standard PBR material)
MTL (only one built-in material really)

Other planned front-ends which have no ETA:

MaterialX
NvidiaMDL

Abstract Syntax Forest

A material is represented by an expression tree, it can look like this:

graph TD
    F[Front Face Root Node] --> |Root isn't really an extra node we point straight to Sum|A{Sum}
    A --> |attenuate|w_0{Metallic Fresnel}
    w_0 --> Conductor(Conductor)
    A --> |attenuate|w_1{Albedo}
    w_1 --> DiffTrans(Diffuse Transmitter)
    A --> |attenuate|w_2{Non-PBR Transmittance/Reflectance}
    w_2 --> GlassF(Glass Eta=k)
    B[Back Face Root Node] --> |Root isn't really an extra node we point straight to Sum|C{Sum}
    C --> w_1
    C --> |no extra weights| GlassB(Glass Eta=1/k)

This in reality is a an Abstract Syntax Forest, not an Abstract Syntax Tree (and in V2 it will be an Abstract Syntax Directed Acyclic Graph).

The nodes in the diagram are as follows:

rounded rectangles = BxDF leaf nodes, our atomic expression
rhomboids = accumulation/summing nodes
hard rectangles = root nodes

Logically (not codegen-wise, for reasons I'll explain farther down), you do the following to evaluate the BxDF:

Depending on the sign of GdotV or gl_FrontFacing you pick the appropriate tree for the root node
starting at the bottom, evaluate all the expressions and move up accumulating them
return the accumulated value

V2 Nodes

Leaf BxDF (0 children)

This BxDF is always in the form that passes the White Furnace Test, absorption not included as far as possible!

So:

Diffuse Albedo is not included
Non PBR Multipliers are not included

Every BxDF should be able to tell us how much energy it will loose $E$ based on $\omega_{\text{i}}$ as we'll need this for our importance sampling weight!

Normal Modifer (exactly 1 child)

Makes the whole subexpression directly below it use a different shading normal, either:

supplied by a bump or normalmap
supplied by the interpolated vertices

If another Normal Modifier appears in the subexpression, it replaces this normal for its own subexpression.

This means that for subexpression elimination, one needs to include the normal used for shading in the hash and comparison. Basically you cannot eliminate a subexpression if it will be using a different shading normal.

Attenuators

This node multiplies the value of its child subexpression by it value.

It is important to re-order attenuators such that:

Scalar attenuators are higher in the chain than RGB attenuators (register pressure when evaluating)
Attenuators more likely to kill the subexpression should be higher up

This ordering favours the speed of evaluation as opposed to importance sampling.

We divide into several classes (if I'm missing something, let me know):

Constant (exactly 1 child)

This will used to implement Diffuse BxDF's $\rho$ and the non-PBR Reflectivity/Transmittance.

We'll make it have a 2-bit flag whether it should be applied to reflective, transmissive paths or both.

Absorption (exactly 1 child)

Only applied on Transmissive paths, basically its similar to Constant Attenuator, but we raise the parameter to the power of $\frac{1}{|\omega \cdot \mathbf n|}$

Also we store a 1-bit flag to tell us which side of the interface (using which $\omega$'s angle) we should apply it to.

Diffuse Substrate (exactly 2 children)

Left Child: Must be Cook Torrance Reflector (so we can snoop $\eta$ now, and maybe the total energy loss due to fresnel later) Right Child: Must be a Diffuse BxDF

For now, compute the two transmittance factors of 1-Fresnel of the Shading Normal $n$ for incoming and outgoing directions respectively and their product. Then apply the multi-scattering correction.

Future Improvements:

Instead of using $1-F$ as the approximation of Energy Absorption by Cook Torrance

Current Old IR/IL Features

Forest not a Tree

We want to optimize the Material VM or Callable SBT by erasing cases for unused opcodes, and eliminating duplicate materials.

We also want to not cause instruction cache misses, this is an important situation where the divergence has already been optimized (BxDFs type and parameters are identical) but the data fetches have not (there are multiple copies in memory of the BxDF parameters) so two objects/samples/pixels using the same material would be fetching from divergent addresses causing cache misses.

Separate Back and Front face root nodes

This allows us to perform optimizations by removing entire branches which do not need to be evaluated knowing the sign of GdotV for such as any BRDFs for geometry back faces if the material is not Mitsuba-style "twosided".

Furthermore we can precompute a lot of parameters for BSDFs which change depending on the sign of GdotV such as the Refractive Index for dielectric BSDFs.

Special Bump-map and Geometrical Normal reset IR Nodes

They don't really do anything to the accumulated expression, they're more of a "marker" that all BxDF nodes in the branch below (until the next such opcode) will use a normal different to the vanilla shading normal.

If we were to go back to Schussler at some point, this node could change from a 1:1 node to a 1:3 node in case certain subexpressions could be eliminated.

New IR/IL Must Have Features

Directed Acyclic Graph instead of a Forest

Our current de-duplication only allows for sharing an entire Material Expression, we want to be able to share subexpressions.

We also haven't coded the thing with sharing subexpressions in mind.

Construction time Optimization and Canonicalization

Each subexpression needs to have a canonicalized (and therefore optimized, at least in the terms of constant folding, etc.) form.

BxDF always passes WFT convention

Attenuation Nodes

Right now most of our BRDFs have the factors that forsake 100% Energy Conservation, folded into their opcodes, parameters and execution. This is a bad idea, especially when importance sampling mixtures of BxDFs, as we'll cover later.

Therefore all factors that make the BxDF loose energy such as:

Absorbing Fresnel
Plastic (coating) TIR Scattering Diffuse Fresnel Weight
Custom Reflectance
Custom Transmittance should be separate nodes in the IR/IL.

Backend

The backend traverses the IR/IL (possibly multiple times) to produce a means of performing the following on the target architecture and environment:

BxDF evaluation
BxDF importance sampling
BxDF throughput and PDF computation given an importance sampled direction
BxDF denoising attributes computation (for now albedo and shading normal, in the future velocity vectors)

For all the above the functions take as paramters:

viewer/observer direction
geometrical normal
shading tangent frame (TBN)
the root node metadata (its up to the backend to decide how to interpret this data and retrieve the parameters for the expression) The functions computing values (1) or throughputs and pdfs (3) also take:
an incoming light direction

Whether this is achieved via the means of outputting a constant instruction stream (Assembly/IR or High Level) with a separater entry point per root node, or as a Virtual Instruction stream for a handwritten VM, or any hybrid of approaches is the choice of the backend.

How the old compiler shits the bed

Bad Importance Sampling of mixes

Obscene Register usage

Improvements in new compiler

Compile Time

Merkle Trees and Hash Consing at runtime (MASDAG)

Instead of having a Material Abstract Syntax Forest, have a Material Abstract Syntax Directed Acyclic Graph.

Right now we only de-duplicate entire expression trees, and it also seems to be happenning in the Frontend. The Mitsuba Frontend actually constructs suboptimal IR/IL and then attempts to optimize it later (constant folding etc).

If we use MerkleTrees and HashConsing in the IR/IL allocator/builder then we can make all Frontends benefit from subexpression de-duplication.

Canonical Subexpression Forms

If we can't compute a unique equivalent subgraph for a subexpression we cannot hopefor a constant hash and subexpression graph topology comparison, both of which are needed for hash consing and deduplication of subexpressions

This means for example:

sorting/ordering (by bubbling up) attenuator chains
sorting/ordering subexpressions in an accumulation
turning all N>3 subexpression accumulations into chains of binary ops

"Linking" / Material DAG joining

A cool side-effect of being able to do subexpression elimination, is being able to link/join multiple ASDAGs together by treating them as plain old common subexpressions.

Devsh-Graphics-Programming / Nabla

Material Compiler V2 #443

Glossary

BxDF = Either a BRDF or BSDF

LTC = Linearly Transformed Cosines

The implementation of BxDFs in Nabla

General Case

For a dumb reason the quotient computing functions are contain rem_ in the name

Super Interesting Tangent: Relationship of the Jacobian of the sample generator to its PDF

"Smooth" Diffuse BxDF

"Rough" Oren-Nayar BxDF

Cook-Torrance BxDFs

Equations

Conclusions

$|\omega_{\text{o}}\cdot\mathbf n|$ can be factored out

Sampling the Distribution of Visible Normals (VNDF) will remove almost every factor from the quotient

The rest of the terms in the quotient composed of dot products involving $\omega$ can be factored out by construction

When you have a White Furnace Test passing BSDF in a Spectral Renderer even the Fresnel disappears

Bonus Round: Thindielectric BSDF

Smooth Thindielectric

It is further possible to incorporate an extinction function (but not a phase function) e.g. simple Beer's Law.

[Perpetual TODO] Rough Thindielectric

All the BxDFs we didn't cover and which might be needed for Ditt (in order of importance)

Velvet

Hair/Fur

Human Skin

True Diffuse

Specular Anti-Aliasing and Filtering : The missing pieces in Nabla

The cause of Specular Aliasing

The value of $f$ varying with $\mathbf x$ and $\mathbf x$ varying in screenspace

Both $\omega$ parameters varying due to surface curvature or projection

Multiple texels from a Normal/Bump/Derivative Map visible in the same pixel footprint,

Super Cool and Relevant Note on Parametrizing Ray Distributions

Ideal Solution to Specular Aliasing

Practical Solution to Specular Aliasing

Don't end up right back where you started

Lets re-use our own BxDF as an approximation to $\tilde{f}$

Lets consider different Specular Aliasing Causes in separation

Varying occlusion or distinct surfaces being hit over the distribution of rays

Accounting for the distribution of Observer Angles

Curvature

Variations in BxDF Parameters sourced from Textures (Albedo, Bump/Normal/Derivative Mapping)

The Convolution of the Ray Distribution by the BxDF

Ray Distribution Representations

Ray-Differentials and how they suck

The basics of how they work

Approximate a pixel's footprint on an intersected planar Surface

Turn an outgoing distribution into an incoming one

Correct for Curvature

Variations in BxDF Parameters

Convolution

Why they suck: They cannot deal with fuzzy distributions

Why Covariance Matrices rock

Why they are stupidly good

Roughness Scaling for Geometrical Specular AA: Directions not clear

Sums of BxDFs : Covariance Tracing's Achilles Heel

Unlikely Conclusion : Another way to refit BxDF parameters

Anti-aliasing of Derivative Maps

NDF Matching

VNDF Matching

Whole BxDF Matching

Stochastic Descent Optimization Implementation

Numerically Stable MIS

Motivation

Each Technique has different strengths and exhibits more or less variance in different places

However the codomains of the directions produced by the Techniques are not disjoint

The Idea: Blending contributions with outgoing direction dependant weights

MIS: Make the weights dependant on the PDF

The Power Heuristic

However this must be made numerically stable!

Where numerical instability comes from

The Solution: Reformulation

The most important thing to note is, that for every sample $\omega{\text{o}}$ which was generted by $g{\text{j}}$ we have $p{\text{j}}(\omega{\text{o}})>0$

The special case of the product distribution

Further topics

Don't just use the PDF of the Technique, use the full Target Distribution (part of Resampled Importance Sampling)

Budgeting Samples (RIS, Path Guiding and Machine Learned MIS touch upon this)

Make the weights have values outside the [0,1] range (Krivanek's last paper)

Importance Sampling BxDF Mixtures

In reality we only have two options for combined sampling

For a dumb reason the quotient computing functions are contain `rem_` in the name