Open devshgraphicsprogramming opened 1 year ago
Specular aliasing arises from the shape and direction of the peak of $f(\mathbf x, \omega{\text{i}}, \omega{\text{o}}, \lambda) |\omega_{\text{o}} \cdot \mathbf n|$ actually varying over the footprint of a pixel and this can happen in multiple ways:
This happens when for example, $\alpha$ or $\eta$ is fetched from a texture.
Imagine any of the following:
There's a paper called Geometrical Specular AA which aims to tell you how much you should "crank up" the $\alpha$ based on the Principal Curvature (which you can compute from screenspace derivatives of normal vectors converted to derivatives w.r.t. surface parametrization) but it assumes a Bekcmann NDF being used.
There's a newer one from 2022 which develops a similar framework for GGX.
If you don't use Schussler et. al. 2017, then this makes both of the $\omega$ vary but differently (not analytically or in a way which can be approximated with a locally with a few derivatives) than in the above case.
But if you do use it, then there's a perturbed normal parameter which performs a very similar role as perturbing the actual normal used for determining the tangent space for the $\omega$ parameters.
An infinite ray (line) in 3D is unquely parametrized by 4 not 6 variables.
This is because it doesn't matter how the ray origin is moved along its direction, or if the direction is inverted, you'll get the same ray.
Hence you can parametrize the ray by:
The whole problem comes from the fact that we can't accurately approximate the integral of the lighting equation over the footprint of a pixel filter with just a few samples
I(\mathbf u, \lambda) = \int_{\{\mathbf s \in \mathbb{R}^2 | W(\mathbf s)>0\}} W(\mathbf s) L_{0}(raygen_{\mathbf x}(\mathbf u + \mathbf s),raygen_{\omega_{\text{i}}}(\mathbf u + \mathbf s), \lambda) \operatorname d \mathbf s
this is basically the measurement equation, where $W(s)$ is the given ray's contribution to the sampled image pixel value $I(u,\lambda)$.
So let us define a new filtered BxDF over some distribution of incoming ray intersection points and angles $V(\mathbf v)$, this is not the exact same thing as the above, its a split sum approximation (like all the single tap prefiltered Image Based Lighting in recent PBR rasterization based engines):
\tilde{f}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\{\mathbf v \in \mathbb{R}^2 \times \Omega | V(\mathbf v)>0\}} V(\mathbf v) f(\mathbf x + \mathbf r(\mathbf v), \omega_{\text{i}}+\omega(\mathbf v), \omega_{\text{o}}, \lambda) \frac{|\omega_{\text{o}}\cdot\mathbf n(\mathbf r(\mathbf v))|}{|\omega_{\text{o}}\cdot\mathbf n_{0}|} \operatorname d \mathbf v
this basically gives us a weighted average (more accurately, a convolution) of all the BxDF values as they would be evaluated for our "incoming ray mix". It is this weighted average of the BxDF that we subsitute into the lighting equation at each bounce, hence the split sum approximation, now inside the integral there are two separate sums being multiplied together instead of a single one.
For derviative maps or curvature, the normal used for the lighting equation depends on the ray intersection position which itself is a distribution for the split sum.
Assuming you can actually efficiently (not ideally) importance sample $V$, then importance sampling this $\tilde{f}$ monster is actually quite straight forward, you can do it via composing warps:
g_{\tilde{f}}(\mathbf x, \omega_{\text{i}}, \lambda, \xi_0) = g_{f}(\mathbf r(g_{V}(\xi_0)), \omega(g_{V}(\xi_0)), \lambda, \xi_1)
note that you need 2 extra dimensions from the sample.
The problem is still that you've not reduced the dimensionality of the Light Transport integral, its actually increased! And your convergence rate is the same order of magnitude or worse than the original equation involving $W(\mathbf s)$.
Note how the above sampling strategy can produce the same $\omega{\text{o}}$ in multiple ways, making it so that the actual $q{\tilde{f}}$ is impossible to compute, as you neither know the value of $\tilde{f}$ nor $p_{\tilde{f}}$ and nothing is guaranteed to factor out.
Note that $\tilde{f}$ is but a convolution (or rather correllation) of our original $f$ with $V$ along the position and observation angle dimensions, this is super relevant later for discussing Covariance Tracing.
What rather than the original BxDF model itself (except for a Linearly Transformed Cosine) is a better fit to approximate the filtered BxDF ?
Here we explicitly state model, because obviously we need to fit new parameters.
As a simple example of this approach, lets consider a Diffuse $f$ whereby construction we have:
\frac{\operatorname d f}{\operatorname d \omega_{\text{i}} } = 0
so we only really depend on the intersection points of $V$ with our surface.
A common approach to approximating the filtered Diffuse BRDF is to filter its input parameter $\rho$, in theory like this:
\tilde{\rho}(\mathbf x) = \int_{\{\mathbf v \in \mathbb{R}^2 | V(v)>0\}} \rho(\mathbf x + \mathbf r(\mathbf v)) \operatorname d \mathbf v
but obviously the set of intersections for rays $\mathbf v$ with $V(\mathbf v)>0$ on the surface we're shading will form a complex not easily integrable domain. So we further take additional assumptions:
... and I've just explained Anisotropic Texture Filtering (rhomboid) and EWA Filtering (Anisotropic 2D Normal Distribution).
As we've established we have multiple causes of specular aliasing, here's a list with general approaches to solve them.
The first thing we need is an approximation to $V$ but one that does not need to necessarily be importance sampled, however it needs to be:
There's not much we can do here in terms of filtering when tracing classical Binary Surface Reps in a BVH with a single hero ray.
Covariance Tracing has some ideas, but according to my reading of the paper you'd need voxels for that.
Would be extremely useful for prefiltering shadows (visibility) for the emitters we hit.
The solution here is to take the distribution of outgoing rays from previous path vertex and transform that into a distribution of incoming rays for the next vertex (the surface hit by the "hero" ray).
Assumptions and approximations:
What we need to know:
After this step we get a distribution of incoming rays $V_i$ in the plane of the surface, it can be used to filter the parameters of $f$ that come from a texture.
The surface is actually curved and we can correct for that given the principal curvatures if we can differentiate the surface's shading normals w.r.t. some surface parametrization (barycentrics are always present).
Assumptions and approximations:
What we need to know:
This gives us $V_g$ the incoming distribution corrected for curvature.
The really cool thing up until now, is that no matter how you represent $V$, all the steps above don't, and shouldn't care about what the BxDF is.
Assumptions:
What we need to know:
We now get a new set of BxDF parametersand therefore a new BxDF $f_t$, not a new viewing ray distribution.
Sidenote: It would be cool to factor the distribution of viewing directions (transformed into UV space) into the derivation of the distribution of visible perturbed normals as Schussler 2017 makes certain normals more visible than others due to shadowing and masking. In general, without Schussler, you would not be taking the viewing direction into account, because each pixel (bunch of perturbed microfacets) of the normal map has the same area both projected onto the macrosurface and onto any observing ray direction.
This is why the perturbed normal distribution transform should come after curvature, but also because it alters our BxDF parameters which in turn alter how we compute our convolution of outgoing ray distributions by the BxDF.
We now need to derive a yet another modified BxDF $\tilde{f}_v$ from:
Now, why would we want to do that?
Its to make sure our both our:
You should be able to see that we don't suffer from double/redundant correction anywhere, as its the ray distribution transforms that deal with the aliasing arising due to geometry, and the BxDF re-fitting that deals with meso-surfaces and convolution by the original BxDF.
Assumptions:
What we need to know:
Since we assume that the parameters of $\tilde{f}_t$ are locally constant, we don't forward $\mathbf x$ to it, and define:
\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\mathbf r(\mathbf v_{\text{i}})=\mathbf x \land V_g(\mathbf v_{\text{i}})>0} V_g(\mathbf v_{\text{i}}) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf v_{\text{i}}), \omega_{\text{o}}, \lambda) \operatorname d \mathbf v_i
We then need to represent our outgoing ray distribution $V_o$ properly, which based on previous estimates is:
V_o(\mathbf v_o) = \tilde{f}_v(\mathbf r(\mathbf v_o), \omega_{\text{i}}, \omega(\mathbf v_o), \lambda)
Finally one might want to account for the fact that importance sampling might be drawing multiple samples on the same surface and "narrow down" the distribution proportionally to the sample count as an esitmate of the sampling rate of the outgoing directions. However, one should not take into account the path depth, the loss of sample rate due to ray spread on deeper paths is already taken into account by all the above!
We need a way to represent $V$ in a way that lets us perform all or some of the mitigation techniques outlined in the previous comment.
So far I know of only two.
Ray Differentials start with the derivative of the ray directions between neighbouring samples, if using sample sequence of let's say $n^2$ samples which is also stratified (each cell in a $n \times n$ grid contains one sample), like for example, a (t,m,s)-net then the derivative of the ray at the start of the path is:
\frac{1}{n} \frac{\operatorname D raygen(u,v) }{\operatorname D (u,v)}
the $n$ accounts for sampling density (avergage distance between samples).
The ray-origin derivative is obviously null when using a pin-hole camera as there is a singular focal point for all rays.
This is how you perform the following operations
You can simply construct a frustum based on the positional and directional derivatives and intersect this with the surface.
Simply negate the directional derivatives, rays will still travel through the same parametrized origins.
We need a function to express a ray in the shading normal's tangent space, we then use the Chain Rule to derive first order derivatives in that tangent space.
Find out the footprint in tangent and UV space, filter textures appropriately.
So far I don't know of any method to make the viewing direction play a role, but this would only be needed for Fresnel which rarely ever varies spatially.
The only hope of achieving anything here, is to figure out transformations of parameters (usually just $\alpha$) based only on the directional derivatives of the incoming rays in tangent space.
Then when importance sampling, compute the derivative to use as the new directional derivative with an approximation like this:
\frac{1}{\sqrt{n}} \frac{\operatorname D g}{\operatorname D \xi}+ \frac{\operatorname D g}{\operatorname D \omega{\text{i}}}
but in a way that does not cancel out opposing directional derivatives.
This adds the spread thanks to the importance sampling's roughness and the original incoming distribution, its nice because:
This method also sucks because its completely dependant on your sample generator:
Ray differentials is that they prescribe a discrete solid (binary) shape of a frustum that the rays occupy.
The problem with that is as you take a bundle of rays within a frustum (can be very thin) and reflect or refract it with a rough BxDF, they will spread over a large range of angles, away from a perfectly reflected frustum and not every angle will have the same intensity!
A Ray Differential assumes all rays within the frustum have the same importance/intensity.
In the end what really happens in renderers is that rather than actually work out the differentials analytically by differentiating the complex materials, the ray gets some metadata to keep track of finite differences in screenspace X and Y:
In Mitsuba and PBRT you essentially get 2 "ghost" rays which are not taken into account when traversing the BVH but are:
This is absolutely horrible because:
One way to combat both issues above is to also consider a $\Delta \xi_i = \frac{1}{\sqrt{n}}$ in your importance sampling functions, note that this makes your path generation 3x more expensive than it was.
If you take the alternative and account for all aliasing cases separately, we make our importance sampling functions compute their derivatives in $\xi$ while they are generating the "hero" sample. But this is a very fragile approximation which is likely to break for the reasons outlined above.
The ideal would be to fit a frustum to the outgoing direction Projected BxDF value distribution, but the only way I can see how to do that is to use Linearly Transformed Cosines and fit the frustum to that.
Covariance Tracing deals with 4D (if you ignore time) distributions (spreads) of ray origins and directions, they're modelled as a 4x4 covariance matrix which means fuzzy gaussian distributions.
Every time you do something with the ray distribution, like:
there is a 4x4 matrix to multiply the original distribution with. Its actually less than 4x4, because covariance is symmetrical, so only need to concern ourselves with UL matrix.
Then you just project when you need to get the 2x2 covariance matrix of the ray distribution as it intersects a surface's tangent plane which will be w.r.t surface's UV coordinates.
This gives you an oriented gaussian filter elliptical footprint which is exactly the thing you need for EWA texture minification which is considered superiror to Anisotropic Filtering (which we'll still use).
For Anisotropic Filtering you can just extract the ellipses minor and major axes multipled by some fixed cosntant (however many $\sigma$ we want to fit in a box) and use that for sampling.
Recall our original convolution over a projected disk:
\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\mathbf r(\mathbf v_{\text{i}})=\mathbf x \land V_g(\mathbf v_{\text{i}})>0} V_g(\mathbf v_{\text{i}}) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf v_{\text{i}}), \omega_{\text{o}}, \lambda) \operatorname d \mathbf v_i
We can express it as a convolution over a plane perpendicular to the importance sampled outgoing "hero" $\omega_o$ ray, assuming a small tiny spead of the ray distribution:
\tilde{f}_v(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) = \int_{\perp \omega_{\text{o}}} V_g(\mathbf v_{\text{i}}(\mathbf u)) \tilde{f}_t(\omega_{\text{i}}+\omega(\mathbf u), \omega_{\text{o}}, \lambda) |\frac{\operatorname D \mathbf v_{\text{i}}}{\operatorname D \mathbf u}| \operatorname d \mathbf u
Now the absolute cool thing that Covariance Tracing does, is that it expresses the above in terms of a Fourier Transform.
\mathcal F(\tilde{f}_v) = \mathcal F(V_g)\cdot\mathcal F(\tilde{f}_t |\frac{\operatorname D \mathbf v_{\text{i}}}{\operatorname D \mathbf u}|)
at this point, we should note that all the equations convolved using the incoming direction variable.
Now the funny thing is that this formulation of $\tilde{f}_v$ actually tells us how the light reflected from a constant $\omegao$ varies with $\omega{\text{i}}$, which is not what we're looking for. But since BRDFs are supposed to be symmetric we can swap the $\omega$ parameters to get our answer.
This is where the strategic choice of using covariance as the approximation of $V$ pays dividends, you can go back and forth between primal and frequency space with Gaussian Distributions very easily. Also we can use convolution and correllation interchangeably because the representation of $V$ is zero centered and symmetric.
This means that in order to find a $V_o$ that fits $\tilde{f}v$ in primal space, one does not need to invert_ $\mathcal F(\tilde{f}_v)$. The fit can be performed in frequency space, and it simply involves adding covariance matrices together!
\Sigma_{V_o} = \Sigma_{V_g}+\Sigma_{\tilde{f}_t}
but because the covariance matrix of the BRDF is not invertible in 4D a slightly different formulation is used
\Sigma_{V_o} = (\Sigma_{V_g}^{-1}+\Sigma_{\tilde{f}_t}^{+})^{-1}
The spreading of our Covariance Matrix won't:
Therefore we'd need to deduce how to bump up the anisotropic $\alpha$ given a covariance matrix in the local surface coordinate frame, essentially computing $\tilde{f}_t$.
Obviously YOU COMPUTE THIS SEPARATELY FOR THE REFLECTION AND REFRACTION cases, you cannot put two vastly separated unimodal distributions (which together form a bimodal) into a single unimodal (which is what a gaussian given by covariance matrix is) and hope to get anything remotely useful. This is obviously a problem for your standard dielectric with has two peaks in the plot of outgoing directions, one for reflection and one for refraction.
So since we cannot approximate multi-modal distributions with a Gaussian derived from a single Hessian, we have a problem defining a single Hessian to use for our ray distribution transformations. We could either:
Whatever the case seems like Durand 2005 is a much needed read first.
Since both Covariance Tracing and an alternative version of Ray Differentials would both use the Jacobian
\frac{\operatorname D \tilde{f}_t}{\operatorname D \omega_o}
to estimate $V_g$, we could flip the parameter hitting on its head and ask what parameters does $\tilde{f}_v$ need for its Jacobian to stay close to the convolution of the Jacobian of $\tilde{f}_t$ with $V_g$.
As one knows, mip-maps of bump-maps, derivative-maps and normal-maps tend to average the values in the footprint to a single value. This produces a smooth surface under minification which is not correct.
One needs to filter the original NDF's $\alpha$ parameter alongside the perturbed normal $\mathbf n_p$ for a given pixel footprint $P$ with a density $W$ to closely match the underlying distribution of normals in the actual normalmap.
Because the input Derivative Map's X and Y channel histograms do not need to be equal (think metal grating), the filtered NDF can be anisotropic even if every perturbed normal had an isotropic NDF applied to it. This is yet another argument why all NDFs should be implemented as anisotropic and not optimized for the isotropic case.
Therefore filtering techniques that spit out covariant roughnesses are best (so that NDF is not only anisotropic, but that the anisotropy can be rotated). So there are $\alpha{\text{x}}$, $\alpha{\text{y}}$ and $\alpha_{\text{x}\text{y}}$. But attention must be paid to the actual representation such that they behave nicely under Trilinear Interpolation.
Because of all of the above, the BxDF (or at least the NDF) intended for use with a Derivative Map needs to be known.
It remains a challenge to formulate our "target" function to minimize, but one should hope its some sort of an error metric between the average BxDF value from the pixel footprint's pixels and the BxDF of the filtered perturbed normal and roughness.
There is a choice of target functions, but:
We could of course define our spatially varying distribution beautifully in terms of a convolution, for example:
\int_P W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}) \operatorname d \mathbf x
but texels in an image are sampled, and therefore Dirac Impulse Trains, so by the sampling property the above turns into a sum
\Sigma_{\mathbf x \in P} W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}})
Optimizing most of the following target functions is quite easy, there will always be an integral of Square Error in Solid Angle measure where the parameters we're optimizing are arguments to some distribution.
This can be optimized easily provided we can differentiate the Square Error Integral w.r.t. $\alpha$ and $\mathbf n_p$.
The fact that its an integral doesn't matter we can randomly initialize the parameters and apply Stochastic Descent (basically do what Mitsuba 2 and 3 do).
\int_\Omega (D(\tilde{\alpha},M(\mathbf{\tilde{n}}) \omega_{\text{m}})-\Sigma_{\mathbf x \in P} W(\mathbf x)D(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}))^2 (\omega_{\text{m}}\cdot\mathbf e_2)^2 \operatorname d \omega_{\text{m}}
The idea behind this one is simple, our "Ground Truth" NDF is the weighted sum of all per-texel NDFs of varying roughness rotated by $M$ to have a new tangent space that aligns with $\mathbf n_p$.
Then with the given constraints we try to make our model fit the ground truth to the best of our ability.
NOTE: If we start to use Schussler et al. 2017, then $\mathbf n_p$ becomes a parameter of $D$. There will also be a problem in the above formulation as the NDF always has a tangent specular microfacet which is a delta distribution which will mess up our differentiation.
Another insight is that maybe we shouldn't care about the normals we're not going to see anyway.
One would think to attempt to generalize by making $\tilde{\alpha}$ and $\mathbf{\tilde{n}p}$ depend on $\omega{\text{i}}$, however that would probably break the repriprocity and following from that, the energy conservation properties of the resulting BxDF.
So we are left with this:
\int_\Omega \int_\Omega (D_{\omega_{\text{i}}}(\tilde{\alpha},M(\mathbf{\tilde{n}}) \omega_{\text{m}})-\Sigma_{\mathbf x \in P} W(\mathbf x)D_{\omega_{\text{i}}}(\alpha(\mathbf x),M(\mathbf n(\mathbf x)) \omega_{\text{m}}))^2 \operatorname d \omega_{\text{m}} \operatorname d \omega_{\text{i}}
Finally one might approach the problem from the angle of "who cares about the microsurface profile, I care about appearance".
\int_\Omega \int_\Omega \int_\Lambda (f(\tilde{\alpha},\tilde{\mathbf n}_p, \omega_{\text{i}}, \omega_{\text{o}}, \lambda)-\Sigma_{\mathbf x \in P} W(\mathbf x)f(\alpha(\mathbf x),\mathbf n_p(\mathbf x), \omega_{\text{i}}, \omega_{\text{o}}, \lambda))^2 (\mathbf n \cdot \omega_{\text{o}})^2 \operatorname d \lambda \operatorname d \omega_{\text{o}} \operatorname d \omega_{\text{i}}
Obviously to try to apply this to a mixture or blend of BxDFs would be insanity requiring proper differentiable rendering (due to the absolute explosion of parameters to optimize).
But we could make the "Rough Plastic" a special case and allow treating that blend as a singular BxDF, or subexpressions in the BxDF tree which use the same roughness and derivative maps.
We re-use our GPU accelerated polyphase filter, which preloads a neighbourhood of $P$ into shared memory.
There are some obvious constraints:
We then repeat the following process a few times:
K
iterations or error is below a thresholdWhen integrating the light transport equation at a path vertex:
\int_\Omega f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) L_{\text{i}+1}(\mathbf x, \omega_{\text{o}}, \lambda) |\omega_{\text{o}}\cdot\mathbf n| \operatorname d \omega_{\text{o}}
we can use multiple importance sampling techniques $g{\text{j}}(\mathbf x, \omega{\text{i}}, \xi)$ to generate our $\omega_{\text{o}}$.
Each technique's PDF $p{\text{j}}(\mathbf x, \omega{\text{i}}, \omega_{\text{o}})$ will approximate a different factor or term in the light transport equation, for example:
Note that we omit $\lambda$ from the notation because for a BxDF we don't model photon absorption and re-emission, so all properties need to be satisfied separately for each value of $\lambda$
As per the famous image in the Veach thesis:
Other techniques such as path guiding may do a great job of approximating the Incoming Radiance, but its only an approximate and therefore be inefficient at "hard to sample" light distributions (usually sparse things or accounting for the windowing by the BxDF).
Ideally we'd like to combine them.
This means that accounting for the overlap to be able to use both techniques is necessary.
Simply averaging the results will not decrease Variance, you want to weight them so that a technique contributes more to the final result when it is "better" than the others.
And ideally weigh them smoothly, to have transitions without visual artefacts.
Basically we weigh each contribution $q{\text{j}}(\mathbf x, \omega{\text{i}}, \xi)$ with a weight $w{\text{j}}(\mathbf x, \omega{\text{i}}, \omega_{\text{0}})$ such that all weights sum up to $1$, we are still integrating the same integral. This is because
q_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi) = \frac{f_{\text{j}}(\mathbf x, \omega_{\text{i}},g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi)) L_{\text{i}+1}(\mathbf x, g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi)) |g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi) \cdot\mathbf n|}{p_{\text{j}}(\mathbf x, \omega_{\text{i}},g_{\text{j}}(\mathbf x, \omega_{\text{i}}, \xi))}
There are various ways of obtaining and defining these weights.
The intuiting is that higher the PDF of generating an outgoing direction with a given Technique the "better" it should be compared to the others.
The definition is simple:
w_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{(n_{\text{j}} p_{\text{j}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}}))^{\beta}}{\Sigma_{\text{k}} (n_{\text{k}} p_{\text{k}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}}))^{\beta}}
where usually $\beta=2$ and $n_{\text{j}}$ is the overall proportion of the samples you're allocating to the generator (its an up-front choice).
For now we're ignoring if we could compute $q_j$ in a numerically stable way.
Because we operate in the real world, and all our computations happen using IEEE754, we cannot:
When importance sampling tight distributions, the PDFs are very large (infinite for delta distributions), which makes the above definition of $w_{\text{j}}$ not numerically robust.
Therefore we can rewrite our weight like so:
w_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{1}{1+\Sigma_{\text{k} \neq \text{j}} (\frac{n_{\text{k}} p_{\text{k}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}})}{n_{\text{j}} p_{\text{j}}(\mathbf x, \omega_{\text{}},\omega_{\text{o}})})^{\beta}}
Lets assume our techinques' PDFs don't contain overlapping delta distributions.
\not \exists \omega_{\text{o}}, \text{j}, \text{k} : p_{\text{j}}(\omega_{\text{o}})=\infty=p_{\text{k}}(\omega_{\text{o}})
Now you can see the weight is extremely stable, as for all values of $\omega_{\text{o}}$ we have the following:
Furthermore, if we require that our technique's PDF satifies
(\exists K \in \mathbb{R}^{+} ) (\forall \omega_{\text{o}} : p_{\text{j}}(\omega_{\text{o}}) > K f_{\text{j}}(\mathbf x, \omega_{\text{i}},\omega_{\text{o}}) L_{\text{i}+1}(\mathbf x, \omega_{\text{o}}) |\omega_{\text{o}}\cdot\mathbf n|
we are then sure that $\forall \xi : q{\text{j}}(g{\text{j}}(\xi)) \neq \infty$.
This is when our integrand is a product of one or more functions, and the sampling Techniques have PDFs proportional to one of them.
It happens in the case of BxDF sampling and NEE.
Lets imagine our integrand:
\Pi_j \hat{f}_j(x)
where each factor $\hat{f}_j(x)$ has an importance sampling technique
\hat{g}_j(\xi)
with an associated PDF $p{\hat{g}{j}}(x)$.
And lets have a quotient again:
\hat{q}_j(x) = \frac{\hat{f}_j(x)}{\hat{p}_j(x)}
Then we arrive at reformulating our integrand with MIS:
\Sigma_j w_{\text{j}}(x) \hat{q}_j(x) \Pi_{k \neq j} \hat{f}_k(x)
and everything is extremely stable.
There are other ways of deriving the weights.
Basically you define a distribution which is your "true" target which can be evaluated but can't be sampled, and use that to define the weights.
The assumption of Veach in 1998 was that you know your sample budgets for each generator up-front and you'll always use them (trace rays). However it is also possible to:
Deriving the weights isn't straight forward, they actually need to be optimized based on feedback, and they make stochastic choice of the techinque difficult.
Suppose we are given the following "mixture" of BxDFs
f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) =
\Sigma c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) f_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})
where $f{\text{i}}$ is a function satisfying the properties for an energy preserving BxDF and $c{\text{i}}$ is a weighting function.
Lets assume that each $f{\text{i}}$ we can mix has a decent importance sampling technique $g{f_{\text{i}}}$.
Now let us think about "how do we derive a sampling function $g$ ?" such that:
The idea is simple, we stochastically choose the produce the sample with
g_{f_{\text{i}}}(\xi_0,\xi_1)
based on $\xi2$, with probability $p{\text{i}}(\mathbf x, \omega_{\text{i}})$.
We choose the generator BEFORE sampling, so note that $\omega_{\text{o}}$ is not an argument to the generator choice probability function.
Then our combined sampler PDF $p$ becomes
p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) =
\Sigma p_{g_{\text{i}}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) p_{\text{i}}(\mathbf x, \omega_{\text{i}})
Our choice then becomes quite limited, as no formulation of $p_{\text{i}}$ is ideal because the sampling process is flawed. Even though we might choose the generator for the BxDF that will contribute the most energy, our single chosen generator's sample can be suboptimal.
But this becomes more and more acceptable as
\frac{\operatorname D c_{\text{i}}}{\operatorname D \omega_{\text{o}}} \to 0
We essentially attempt this
p_{\text{i}}(\mathbf x, \omega_{\text{i}}) = \frac{\int_{\Omega} c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}}{\int_{\Omega} \Sigma_j c_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}}
We could, for example, pretend we have different $\hat{c}{\text{i}} \geq c{\text{i}}$ and use these for definig the probabilities similar to the above.
Here we actually produce all samples with every generator
\omega_{\text{o}_{\text{i}}} = g_{f_{\text{i}}}(\xi_{1+2i},\xi_{2+2i})
Then we define our target distribution
p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \propto f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})
and compute MIS weights $w_{\text{j}}$ pretending that
p_{\text{j}} = p(\mathbf x, \omega_{\text{i}}, g_{f_{\text{i}}}(\xi_{1+2i},\xi_{2+2i}))
Since (the sane, non Krivanek type) MIS weights sum up to 1, we can use them as a discrete Probability of selecting the sample from generator $\text{i}$ using $\xi_0$ as our random variable
p_{\text{i}}(\mathbf x, \omega_{\text{i}}) = w_{\text{j}}
This means that for a mix of $n$ BxDFs, you generate $n$ samples and then evaluate $n^2$ functions!
This is because $f$ is a mixture of $n$ BxDFs, so evaluating $f$ at $n$ different input values, makes $n^$ function evaluations.
Apparently ReSTIR has some non O(n^2) formulation of an approximate of this process
Recall that our combined $q$ is
q(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{f(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{p(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}
and if we do nothing, we can end up with $\infty$ on both sides of the fraction.
So we do a similar trick as with Numerically Stable MIS, and single out the generator component and divide everything by its probability when the generator is $i$
q = \frac{\frac{c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{p_{\text{i}}(\mathbf x, \omega_{\text{i}})} q_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) +
\Sigma_{j \neq i} c_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) f_{\text{j}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{1+\Sigma_{j \neq i}p_{g_{\text{j}}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) p_{\text{j}}(\mathbf x, \omega_{\text{i}})}
And everything works as long as you don't try to make a weighted sum of two or more delta distributions with identical centers.
We start with the following BxDFs
f_1(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{D(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x)) G(\omega_{\text{i}}, \omega_{\text{o}},\alpha(\mathbf x))}{4 |\omega_{\text{i}}\cdot\mathbf n| |\omega_{\text{o}}\cdot\mathbf n|} \\
f_2(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{1}{\pi}
with weights
c_1(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = F(\omega_{\text{i}}, \omega_{\text{o}},\eta(\mathbf x)) \\
c_2(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = (1 - \int_{\Omega} h(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) \operatorname d \omega_{\text{o}}) T(\rho, \omega_{\text{i}},\eta(\mathbf x))
where $T$ is a boost factor to correct for the fact that there is TIR going on in the diffuse layer (as per the Earl Hammon presentation).
Basically we assume that light which was not reflected specularly enters the surface to reach diffuse layer.
We can take a shortcut and approximate the integral in $c_2$ because there are many choices of what to stick there:
The Integral of single scattering Transmittance $D (1-F) G_1$ can be tabulated trivially using SageMath/Mathematica.
If you are in the incredibly lucky position of being able to express your $f_1$ with a Linearly Transformed Cosine fit which also corrected for multiple scattering, then you can get a perfect $c_2$ formulation because anything not reflected by an infinitely scattering dielectric BRDF should enter the surface and scatter in the diffuse layer.
Generally speaking, for diffuse materials you simply output the shading normal and albedo.
But for specular reflectors and transmitters you really want to be able to "see through" and see the albedo and shading normals of the reflected or refracted geometry.
We basically pretend that AOVs are "special" light, emitted at every path vertex $A_{\text{e}}$ by every diffuse or rough surface
A_{\text{i}}(\mathbf x, \omega_{\text{i}}) = \int_\Omega (t(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}, \lambda) (A_{\text{i}+1}(\mathbf x, \omega_{\text{i}}, \lambda)-A_{\text{e}}(\mathbf x)) + A_{\text{e}}(\mathbf x)) |\omega_{\text{o}}\cdot\mathbf n| \operatorname d \omega_{\text{o}}
where $t$ is a Bidirectional AOV Transport Function which we derive from a BxDF.
Regardless of the roughness, $t = 0$.
There is no difference between a transmitter or a reflector.
We want to make $t \to f$ as $max_i \alpha_i -> 0$, the best construction is
t = s(\alpha) f
And we want to compute $s$ as fast as possible, using leftovers from other computations so we can use:
Either becomes a simple function of $\alpha$ as you do not want the transmittance of AoV to depend on light directions for any other reason than Fresnel which is already accounted for.
Generalle speaking, $t \to 0$ as attenuation due to extinction eradicates translucency and as the in-scattering phase function blurs the background.
We define our $t$ similarly to how we've defined our weighted sum BxDF $f$, using the exact same weights $c_i$.
t(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) = \frac{\Sigma_i c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}}) t_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}{\Sigma_i c_{\text{i}}(\mathbf x, \omega_{\text{i}}, \omega_{\text{o}})}
However, we ensure the weights sum up to $1$, because you don't want less intense albedos or foreshortened normals just because of absorption.
There is an open question of whether we want to include things like the diffuse $\rho$ i theo $c_i$ used for defining $t$.
On the one hand there is the option of making the contribution of each BxDF proportional to its weight, on the other to make it proportional to the contribution measured in terms of luminosity.
This depends on the AOV:
When you weight by a scalar, it should be done after all the transport is accounted for. You don't want a split sum approximation of R G B screening at every vertex, just the emissive one.
The AOV signal is really low frequency, its similar to rendering scenes with a lot of very diffuse ambient light, notice there's no directional component in the "emitted" light.
This means good BxDF importance sampling is the perfect approach.
However we are not at liberty to run a separate path tracing simulation for this.
Since the spp required to get a converged AOV image are much lower than the real render, and even slightly noisy AOVs are usable, we get our integral from left-overs in the original Path Tracing computation.
We simply reuse the same samples as for the Path Tracing!
So we simply divide our overall $t$ by the sampling techinque's $p_j$ and weigh it by the MIS/RIS weight $w_j$ if there is one.
Whenever the sample was generated by a particular Cook Torrance BxDF generator $g_i$ we make sure to use the following formulation:
\frac{t}{p} = \frac{1}{\Sigma_j c_j} ( s_i(\alpha) c_i q_i + \frac{\Sigma_{j \neq i} c_j t_j}{p} )
This is because $f$ pops up in the $t$ to make sure $t$ integrates to $s_i(\alpha)$, dictated by our assumption that the implementation of Cook Torrance passes WFT (which is a stretch).
Velocity probably shouldn't follow the regular AOV transport, because a non zero time derivative of any Light Tranport Equation term at any path vertex induces screenspace motion.
We should probably find a way to hack Covariance Tracing or Differentiable Rendering into providing us the real screenspace motion derivatives.
A frontend parses a Material Representation specific to an Interchange Format to produce a format-independent Intermediate Representation/Language of the Material.
For now only the following frontends will be developed/maintained:
Other planned front-ends which have no ETA:
A material is represented by an expression tree, it can look like this:
graph TD
F[Front Face Root Node] --> |Root isn't really an extra node we point straight to Sum|A{Sum}
A --> |attenuate|w_0{Metallic Fresnel}
w_0 --> Conductor(Conductor)
A --> |attenuate|w_1{Albedo}
w_1 --> DiffTrans(Diffuse Transmitter)
A --> |attenuate|w_2{Non-PBR Transmittance/Reflectance}
w_2 --> GlassF(Glass Eta=k)
B[Back Face Root Node] --> |Root isn't really an extra node we point straight to Sum|C{Sum}
C --> w_1
C --> |no extra weights| GlassB(Glass Eta=1/k)
This in reality is a an Abstract Syntax Forest, not an Abstract Syntax Tree (and in V2 it will be an Abstract Syntax Directed Acyclic Graph).
The nodes in the diagram are as follows:
Logically (not codegen-wise, for reasons I'll explain farther down), you do the following to evaluate the BxDF:
GdotV
or gl_FrontFacing
you pick the appropriate tree for the root nodeThis BxDF is always in the form that passes the White Furnace Test, absorption not included as far as possible!
So:
Every BxDF should be able to tell us how much energy it will loose $E$ based on $\omega_{\text{i}}$ as we'll need this for our importance sampling weight!
Makes the whole subexpression directly below it use a different shading normal, either:
If another Normal Modifier appears in the subexpression, it replaces this normal for its own subexpression.
This means that for subexpression elimination, one needs to include the normal used for shading in the hash and comparison. Basically you cannot eliminate a subexpression if it will be using a different shading normal.
This node multiplies the value of its child subexpression by it value.
It is important to re-order attenuators such that:
This ordering favours the speed of evaluation as opposed to importance sampling.
We divide into several classes (if I'm missing something, let me know):
This will used to implement Diffuse BxDF's $\rho$ and the non-PBR Reflectivity/Transmittance.
We'll make it have a 2-bit flag whether it should be applied to reflective, transmissive paths or both.
Only applied on Transmissive paths, basically its similar to Constant Attenuator, but we raise the parameter to the power of $\frac{1}{|\omega \cdot \mathbf n|}$
Also we store a 1-bit flag to tell us which side of the interface (using which $\omega$'s angle) we should apply it to.
Left Child: Must be Cook Torrance Reflector (so we can snoop $\eta$ now, and maybe the total energy loss due to fresnel later) Right Child: Must be a Diffuse BxDF
For now, compute the two transmittance factors of 1-Fresnel of the Shading Normal $n$ for incoming and outgoing directions respectively and their product. Then apply the multi-scattering correction.
Future Improvements:
We want to optimize the Material VM or Callable SBT by erasing cases for unused opcodes, and eliminating duplicate materials.
We also want to not cause instruction cache misses, this is an important situation where the divergence has already been optimized (BxDFs type and parameters are identical) but the data fetches have not (there are multiple copies in memory of the BxDF parameters) so two objects/samples/pixels using the same material would be fetching from divergent addresses causing cache misses.
This allows us to perform optimizations by removing entire branches which do not need to be evaluated knowing the sign of GdotV
for such as any BRDFs for geometry back faces if the material is not Mitsuba-style "twosided".
Furthermore we can precompute a lot of parameters for BSDFs which change depending on the sign of GdotV
such as the Refractive Index for dielectric BSDFs.
They don't really do anything to the accumulated expression, they're more of a "marker" that all BxDF nodes in the branch below (until the next such opcode) will use a normal different to the vanilla shading normal.
If we were to go back to Schussler at some point, this node could change from a 1:1 node to a 1:3 node in case certain subexpressions could be eliminated.
Our current de-duplication only allows for sharing an entire Material Expression, we want to be able to share subexpressions.
We also haven't coded the thing with sharing subexpressions in mind.
Each subexpression needs to have a canonicalized (and therefore optimized, at least in the terms of constant folding, etc.) form.
Right now most of our BRDFs have the factors that forsake 100% Energy Conservation, folded into their opcodes, parameters and execution. This is a bad idea, especially when importance sampling mixtures of BxDFs, as we'll cover later.
Therefore all factors that make the BxDF loose energy such as:
The backend traverses the IR/IL (possibly multiple times) to produce a means of performing the following on the target architecture and environment:
For all the above the functions take as paramters:
Whether this is achieved via the means of outputting a constant instruction stream (Assembly/IR or High Level) with a separater entry point per root node, or as a Virtual Instruction stream for a handwritten VM, or any hybrid of approaches is the choice of the backend.
Instead of having a Material Abstract Syntax Forest, have a Material Abstract Syntax Directed Acyclic Graph.
Right now we only de-duplicate entire expression trees, and it also seems to be happenning in the Frontend. The Mitsuba Frontend actually constructs suboptimal IR/IL and then attempts to optimize it later (constant folding etc).
If we use MerkleTrees and HashConsing in the IR/IL allocator/builder then we can make all Frontends benefit from subexpression de-duplication.
If we can't compute a unique equivalent subgraph for a subexpression we cannot hopefor a constant hash and subexpression graph topology comparison, both of which are needed for hash consing and deduplication of subexpressions
This means for example:
N>3
subexpression accumulations into chains of binary opsA cool side-effect of being able to do subexpression elimination, is being able to link/join multiple ASDAGs together by treating them as plain old common subexpressions.
Bringing_Linearly_Transformed_Cosines_to_Anisotrop.pdf
Implement #156
Glossary
BxDF = Either a BRDF or BSDF
LTC = Linearly Transformed Cosines
The implementation of BxDFs in Nabla
For the GLSL source you can browse here: https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/builtin/glsl/bxdf
and for the in-progress HLSL here: https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/builtin/hlsl/bxdf
Unfortunately the DXC integration (#433, #437, #442) is not operational yet.
General Case
Because the rendering equation can be written as
You have a recursive integral which features two distinct factors.
The Projected BxDF:
and the Incoming Radiance (not to be confused with irradiance):
Generally speaking its impossible to find a constant time closed form solution to importance sampling the product of Reflectance and Incoming Radiance, except for the simplest of cases (point light, line light and lambertian BRDF). This is because $\omega_{\text{o}}$ appears in both factors, hence they are not independent of each other.
Furthemore the entire incoming radiance is not known (NEE is a special case and does not take into account whether the emitter is actually visible, and Path Guiding samples a much smoother approximation of the incoming radiance).
The logical choice is to use MIS or RIS to sample this product of distributions efficiently and split the techniques into Projected BxDF and Incoming Radiance sample generators, this also helps up to keep our code modular and free of combinatorial explosion of specialized code for each light source type and BxDF combination.
This is why all BxDFs have the following functions:
You will often find that you can importance sample the BxDFs in a way such that the $|\omega_{\text{i}}\cdot\mathbf n|$ factor appears in the PDF of the sample generator and therefore disappears from the throughput/quotient hence bringing it closer to a constant.
For a dumb reason the quotient computing functions are contain
rem_
in the nameI cannot explain the brain aneurysm which caused me to call the value divided by the pdf a remainder, and not a quotient. Sorry for that.
Super Interesting Tangent: Relationship of the Jacobian of the sample generator to its PDF
While you might take that in Monte Carlo Integration of $f$ your sample contribution is $\frac{f(g(\xi))}{p_{g}(g(\xi))}$ as the word of God and not investigate further, it is important to consider the relationship between a trivial change of variables and importance sampling (hint: they're the same thing).
Let us take the original thing we wanted to integrate:
lets now perform a change of variables $\omega = g(\xi)$:
since we expect applying Monte Carlo importance sampling as defined above must yield the same answer as not importance sampling a simple change of variables, we have:
or as given in the Practical Product Importance Sampling via Composing Warps paper:
There is an important caveat, for the above trick to work, the sample generation function must be a bijection. If more than one subdomain maps to the same subimage you start needing to add the probability densities (so adding the Jacobian determinants like you'd add the resistances of resistors connected in parallel) together to get the real one.
Therefore it's easy to validate if your importance sampling generator implementation/scheme matches the PDF you think it has, here's an observation and derivation with intuitive density-based discussion I made, waaay before I read the Practical Product Importance Sampling paper.
"Smooth" Diffuse BxDF
For the strandard Lambertian BRDF we have:
and for the BSDF we have:
To remain sane we pretend that $\mathbf x$ does not depend on $\omega_{\text{o}}$ and therefore $\rho$ is constant.
If we importance sample by generating uniformly distributed points $(r,\theta) \times [0,1]x[0,2\pi]$ on a disk perpendicular to $\mathbf n$ and then unproject them onto the upper hemisphere we get the following PDF
but if we then randomly make half of the points go onto the lower hemisphere for a BSDF, we get
This is really nice because the throughput/quotient works out to just $q{f{\text{r}}} = \rho$.
"Rough" Oren-Nayar BxDF
We use Oren-Nayar for rough-diffuse, as its important that our rough BxDFs degenerate nicely to smooth versions as $\alpha \to 0$.
For importance sampling we figured it doesn't matter much and just used the same generator as the Lambertian Diffuse, we could of course, improve that.
Cook-Torrance BxDFs
Equations
Assuming a spatially varying roughness $\alpha(\mathbf x)$ and index of refraction $\eta(\mathbf x)$, the BRDF has the following form:
note the absolute value of the dot products to handle an Internal Reflections for the dielectric case.
Also the $\eta$ needs to be properly oriented w.r.t. $\omega_{\text{i}}$, whenever the latter is in the lower hemisphere we need to use $\frac{1}{/eta}$.
While the BTDF has the form:
where $\omega{\text{m}}$ is the microsurface normal, which is fixed by (and can be worked out from) $\omega{\text{i}}$, $\omega_{\text{o}}$, and $\eta$.
Upon first glance you might think that this BTDF violates the law of reciprocity $f{\text{t}}(\omega{\text{i}}, \omega{\text{o}}) = f{\text{t}}(\omega{\text{o}}, \omega{\text{i}})$ needed for PBR. However note that if you change the directions in the tranmissive case, the $\eta$ changes too, becoming its own reciprocal.
For certain transmissive configurations of $\omega{\text{i}}$, $\omega{\text{o}}$, and $\eta$ the refractive path will have a zero throughput (and therefore we must report a zero pdf) because either:
Conclusions
Most of these I've covered and derived in the our recorded Discord call, which you have access to on Google Drive.
$|\omega_{\text{o}}\cdot\mathbf n|$ can be factored out
It should be immediately apparent that when $f$ is a Cook Torrance BRDF or BTDF, the following expression
contains $|\omega_{\text{o}}\cdot\mathbf n|$ in both the numerator and denominator, which can be cancelled out to avoid a lot of headaches.
Sampling the Distribution of Visible Normals (VNDF) will remove almost every factor from the quotient
There are two variants of the Masking and Shadowing function $G$:
As derived by Heitz in his 2014 and 2016 papers, either variant of the masking and shadowing function $G$ for a microfacet cook torrance model is entirely fixed by the distribution of normals $D(\alpha)$, as it defines $\Lambda_{D(\alpha)}(\omega,\alpha)$ which in turn defines:
This allows us to treat $D G$ as a single function. Nabla's $D G{\text{GGX}}$ is extremely optimized, we pretty much avoid computing $\Lambda{D{\text{GGX}}}$ because it contains many of the same terms as $D{\text{GGX}}$.
Also as shown by Heitz and others, $D$ by itself is not a probability distribution, $D |\omega_{\text{m}}\cdot\mathbf n|$ is. This is because the projected area of the microfacets onto the macrosurface needs to add up to a constant (which is actually the same as the area of the macrosurface), not the area of the microfacets themselves (a more corrugated surface has a greater surface area).
Note that this formulation makes $\omega{\text{m}}$ independent of $\omega{\text{i}}$.
We can also define the Visible Normal Distribution Function for a given fixed $\omega_{\text{i}}$
which tells you what propotion of the macrosurface area projected onto the viewing direction (which is why there is a division by $|\omega{\text{i}}\cdot\mathbf n|$) is made up of microfacets with normal equal to $\omega{\text{m}}$ (they also need to be projected onto the viewing direction hence the dot product in the numerator).
When importance sampling, the $\omega{\text{i}}$ is a fixed constant, so the best sampling of $\omega{\text{o}}$ you can hope to achieve is done according to $f |\omega{\text{o}} \cdot \mathbf n|$, but realistically you can only sample $\omega{\text{m}}$ according to $D{\omega{\text{i}}}$ and then reflect $\omega_{\text{i}}$ about it.
The rest of the terms in the quotient composed of dot products involving $\omega$ can be factored out by construction
When you reflect $\omega{\text{i}}$ about $\omega{\text{m}}$ a change of variable happens and your PDF is
note how when you sample according to VNDF, this becomes:
A similar thing happens with refraction as its PDF is:
which actually makes it so that $p{\omega{\text{o}}}$ is exactly the same for the reflective and refractive case.
Therefore when you sample a $\omega{\text{m}}$ from a VNDF first, and then generate your $\omega{\text{o}}$ via this reflection or refraction the same factors arise in your generators pdf as the value, so that the throughput simply becomes:
for a BRDF, and
for a BTDF.
This should all intuitively make sense when you consider that as $\alpha \to 0$ both $G_1 \to 1$ and $G_2 \to 1$ so you get the exact same thing as an importance sampling an explicitly implemented smooth mirror or glass BSDF.
When you have a White Furnace Test passing BSDF in a Spectral Renderer even the Fresnel disappears
Assuming you're constructing your BSDF by applying a BRDF on reflective paths and BTDF on refractive
If, when importance sampling, you choose whether to reflect $\omega{\text{o}}$ or refract it about $\omega{\text{o}}$ based on a probability proportional to $F$, you get a factor of $F$ in the pdf $p_{f}$ when reflecting and $1-F$ when refracting.
This makes it completely drop out of your throughput so that:
There are however important caveats:
Bonus Round: Thindielectric BSDF
This is a great "specialization for known geometry", you basically assume that each surface with this material is in reality two surfaces which are:
Smooth Thindielectric
The BSDF for smooth glass is:
What should be immediately apparent is that any ray that enters the Thindielectric medium will hit its backside with the exact same angle as it left the front face, this means:
When you put these facts together you see that the equations for infinite scattering Thindielectric BSDF (infinite TIR) become
The $(1-F)^{2}$ factor is present on all but the simple single scattering reflection, because you need a transmission event to enter and exit the medium.
It is further possible to incorporate an extinction function (but not a phase function) e.g. simple Beer's Law.
Let us assume that a ray of light orthogonal to the surface will be attenuated by $\tau_{\perp}$ after passing through the medium, then for a ray penetrating the medium at a different angle we have:
We can compute the following factor to account for all chains of Total Internal Reflection which exit at the same side:
then our BSDF becomes:
[Perpetual TODO] Rough Thindielectric
This will need the energy conservation fix for regular Rough Dielectrics (Multiple Scattering Cook Torrance correction).
Whats important is that our implementation of this will reduce to Smooth Thindielectric as $\alpha \to 0$.
All the BxDFs we didn't cover and which might be needed for Ditt (in order of importance)
Velvet
This is the only one I've had requests for so far.
We should do the same one Mitsuba had, or something more up to date if we can find.
Hair/Fur
For carpets.
The BSDF is what gives you rim-lighting and anisotropic gloss. https://andrew-pham.blog/2019/09/13/hair-rendering/
But its the heightfield / volume tracing that gives the fur effect.
Human Skin
For RenderPeople (not requested yet).
True Diffuse
Lambertian (constant irrespective of angle of observation) is completely made up, if you replace it with an actual volume which does uniform scattering and let the mean free path tend to 0, you'll get a different looking material.
Sony Imageworks has a 2017-ish presentation on that.