autonomousvision / differentiable_volumetric_rendering

This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"
http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf
MIT License
804 stars 91 forks source link

Why is the focal length different in your rendering and img_choy2016? #64

Closed BostonLobster closed 3 years ago

BostonLobster commented 3 years ago

I downloaded the ShapeNet for 2.5D supervised models dataset, and found there are two cameras.npz. One in obj_ID folder, another in img_choy2016 folder.

In the paper, you wrote "While we use the renderings from Choy et al. [13] as input, we additionally render 24 images of resolution 2562 with depth maps and object masks per object which we use for supervision." So, I guess one cameras.npz is for your rendering, the other for choy's.

But the focal length in two cameras.npz are different: In yours, the focal is

array([[2.1875, 0.    , 0.    , 0.    ],
       [0.    , 2.1875, 0.    , 0.    ],
       [0.    , 0.    , 1.    , 0.    ],
       [0.    , 0.    , 0.    , 1.    ]])

but in choy's, the focal is

array([[149.84375,   0.     ,  68.5    ],
       [  0.     , 149.84375,  68.5    ],
       [  0.     ,   0.     ,   1.     ]])

I think the focal length should be same, because you just changed the camera pose during additional rendering, right?

m-niemeyer commented 3 years ago

Hi @BostonLobster , thanks for your question!

Yes, that is correct, the focal length and the principal point is different. If you check our code (e.g. the arange_pixels function), you will see that we assume the image plane to be in [-1, 1] with the center being at 0. The format used by Choy et. al. is [0, H-1] x [0, W-1] with the center point at H/2, W/2. (As a side note: we use the Choy et al. renderings only as input for the encoder, so that we never need to use the camera intrinsics / extrinsics in our repo.)

BostonLobster commented 3 years ago

@m-niemeyer Thanks for your reply! I know the difference now. An additional question here: how to convert the format of Choy et. al. to yours where the image plane reside in [-1, 1]? I'm wondering if I use Choy et. al. rendering for both input and supervision, I have to modify the camera intrinsics.

m-niemeyer commented 3 years ago

I would suggest to do either of the following:

  1. Find the field of view of Choy et al; for this, you need the focal length and the sensor size (= image size). If you have these two values, you can calculate the FoV. With this, you can then calculate the new focal length for a sensor of size 2 (as our image should be between [-1, 1]), and with this this you can create you new camera matrix.
  2. You can multiply the Choy et al. camera matrix with another matrix S from the left (K_new = S @ K_choy). This matrix needs to a.) scale the pixels from [0, H-1] / [0, W-1] to (-1, 1). If I am not mistaken, this should be:
    S = [   [s, 0, -1],
        [0, s, -1],
        [0, 0, 1]]

    where s = (2 / (H - 1)); if H and W are different, you need to two different values, but this is not the case for Choy et al. (as you have squared images).

BostonLobster commented 3 years ago

I'll try your suggestions! Many thanks!!