Some insights - Githubissues

Hi,

you are correct, the details are captured by the shading loss (as shown in Figure 12 in the supplementary material). It does not work perfectly but converges to one possible surface that could explain the observation (i.e., the images).

There are many local minima (i.e., a combination of vertex positions and shader parameters) that fit the observation and I would argue there are two central points that help to converge to a reasonable one: (1) the visual hull initialization is already close to the actual surface; if this is not the case, e.g. for non-convex regions, the solution is not optimal (see failure case in Figure 14, supplementary material). (2) the representative power of the shader is (artificially) limited by the architecture so that there is an incentive to represent details in geometry; if the shader is too expressive, there is a chance that details are "baked" into the appearance (see e.g. the SIREN architecture in Figure 9).

Analysis-by-synthesis techniques are susceptible to visual ambiguities (see Figure 6 in this paper), even more so in our case when the appearance is modeled by a black box neural network that does not adhere to the physics of light.

As for the question of (moving) light: since we are training a single shader that makes no distinction between the cameras and only considers viewing angles, the appearance should be consistent between views. Therefore, I wouldn't expect meaningful results for co-located camera and light images.

fraunhoferhhi / neural-deferred-shading

Some insights #15