Semantic object and material segmentation in nerf

Misterdudeman commented 4 years ago

I am wondering how well semantic segmentation into object and material data and reflection/refraction passes would work in this render pipeline. If it were part of the ray tracing process it would get progressively more accurate as the volume gets resolved.

Perhaps eventually being able to extract material albedo/lighting/shadows/refraction and BRDFs into their own output passes.

cfoster0 commented 4 years ago

Just a thought I've had recently:

You could think of factoring the color component c(r, d) into at least two parts: one part, lets call it diff(r) for diffuse color, that is only dependent on ray position, and one part, lets call it spec(r, d) for specular color. The color at a given point would then be c(r, d) = diff(r) + spec(r, d). Each would be a separate sub-network, the outputs of which get summed.

Note that these wouldn't be true diffuse and specular, since the diffuse component would include shadows and ambient lighting.

Misterdudeman commented 4 years ago

That is briliant! I see where you are going with spec depth. It would indeed be diffuseRaw and specRaw as outputs which would still be quite usable. I had thoughts on either taking the spec + normal direction as input to estimate the lighting environment/direction and the dividing the estimated lighting pass from the rgb to get a diffuseCol map.

That or inversely if you could sample lit, unlit or partially lit color points for each material in the scene and try and have a generator build a diffCol map based on the average luminocity.

kwea123 commented 4 years ago

Just some experiences I have done. I tried to estimate the normal direction using the derivative of sigma w.r.t. xyz, it turns out that it works so-so for indoor scenes but awful for outdoor scenes. In the case where it works, it is possible to generate different lighting conditions using e.g. Phong shading and the result looks quite decent, but it is only doable as a post-processing process.

As @cfoster0 pointed out, we can indeed decompose c(r, d) = diff(r) + spec(r, d) and let the network learn the two components, but I don't think it opens up more possibilities than the current setup; in my opinion the decomposition was meant to maintain the diffuse color while changing the specular color with different light direction, however we have no control of how it will end up with: for example we have spec(r, d) and spec(r, d') as training data, then what is spec(r, (d+d')/2)? There is no guarantee that the result will follow the lighting model, it learns something that follows a mysterious law we don't know (the neural network..), so having diff and spec decomposed gives us nearly nothing...

If anyone has other thoughts, please correct me if you think it works other ways.

cfoster0 commented 3 years ago

Follow up paper that pursues ideas in this vein. https://people.eecs.berkeley.edu/~pratul/nerv/

bmild / nerf

Semantic object and material segmentation in nerf #23