facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.53k stars 1.28k forks source link

Is there a differentiable inverse render function? #1074

Closed samedii closed 2 years ago

samedii commented 2 years ago

I think this could be a very useful module. I think it is already possible to do but I'm not familiar enough with the "internals" of the renderer.

My thought is that I need to:

  1. Render the image with the current mesh + texture
  2. Get the tensor pointing out what pixels on the image point to what (pix_to_face?)
  3. Gather color information and update texture

Do you have any tips for me on what functions/variables I should use to do this?

I've done reverse rendering by backprop + optimization previously but would like to differentiate through the process now.

zhifanzhu commented 2 years ago

Do you mean gather information on 2D image plane and back-project them onto 3D vertex/faces? I'm also interested in that, but why can't we use pix_to_face directly and update the face color to that pixel color?

I think the shading after rasterization make things harder because a color can result from multiple lighing configurations.

samedii commented 2 years ago

Good point. I would be happy to start out with a naive implementation that doesn't take lighting etc into consideration though.

nikhilaravi commented 2 years ago

@samedii as @ktw361 mentioned you can use the pix_to_face to get info on which faces contribute to each image pixel (there will be multiple faces if faces_per_pixel > 1).

Gather color information and update texture

What is the format of the texture? vertex colors or a texture map?

samedii commented 2 years ago

Thanks for the advice! I'm currently using a TexturesVertex but I've heard that TexturesAtlas might look better. It's just been using a lot of gpu when I tried to get it to work.

samedii commented 2 years ago

I'm trying to implement this with lightning included (hard goraud) and it was pretty straight forward but now I'm a little stuck on a good way to calculate the inverse for this step:

pixel_vals = (barycentric_coords[..., None] * pixel_face_vals).sum(dim=-2)

from interpolate_face_attributes_python. I guess if it's just faces_per_pixel = 1 then it is fine but otherwise it should become a system of equations (given pix_to_face). I may have misunderstood the dimensions of the tensors though.

Do you have any suggestions for faces_per_pixel > 1 @nikhilaravi ?

Edit: I misunderstood the purpose, aggregation of the K neighbours happens in hard_rgb_blend or sigmoid_alpha_blend

Edit2: Doing this now (keeping it proportional to results from the render forward pass), does that seem reasonable?

eps = 1e-6
new_pixel_face_vals = pixel_face_vals * ((new_pixel_vals + eps) / (pixel_vals + eps))[..., None, :]
samedii commented 2 years ago

In hindsight with a little more understanding, using the proportions of the previous texture obviously doesn't work. I haven't figured out how to do this nicely so I have a starting guess + optimization to calculate new_pixel_face_vals. It's still very fast at least.

I optimize over this block to find some solution:

pixel_face_vals = face_colors.gather(0, idx).view(N, H, W, K, 3, D)
pixel_vals = (fragments.bary_coords[..., None] * pixel_face_vals).sum(dim=-2)

Trying to make this differentiable is under the assumption that the expression looks like this and that constant does not depend on the pixel values.

constant = optimized_pixel_face_vals / new_pixel_vals.detach()
new_pixel_face_vals = constant * new_pixel_vals

If that's actually true then it should be testable I should be able to only calculate the "constant" once per camera.

samedii commented 2 years ago

Got this working finally. It was a lot more work than I anticipated. Was initially thinking about creating a PR for it but I already spent too much time on this.

sainatarajan commented 2 years ago

@samedii Could you share a working notebook on how you solved it? I'm very interested. Thanks!