NVlabs / neuralrgbd

Neural RGB→D Sensing: Per-pixel depth and its uncertainty estimation from a monocular RGB video
Other
301 stars 41 forks source link

Could you provide the preprocessing code of Kitti like the one you provide for scannet? #12

Closed zenithfang closed 4 years ago

cxlcl commented 4 years ago

Could you be more specific on what/which types of pre-processing is needed ?

zenithfang commented 4 years ago

The one which could configure the depth map as the order shown in mdataloader.kitti.

TruongKhang commented 4 years ago

@cxlcl . could you provide the code and the results of the camera pose extraction from the raw Kitti dataset?

cxlcl commented 4 years ago

If you meant reading the camera pose from the raw Kitti dataset, we are using the pyKitti package to read the camera pose from the raw Kitti dataset: https://github.com/NVlabs/neuralrgbd/blob/d560bd96126cb3a8d300bc866911d93929f7932a/code/mdataloader/kitti.py#L160 https://github.com/NVlabs/neuralrgbd/blob/d560bd96126cb3a8d300bc866911d93929f7932a/code/mdataloader/kitti.py#L168

TruongKhang commented 4 years ago

@cxlcl , thank you. I saw it. I briefly read your code and I have one more question. In file warping/homography.py, the get_rel_extrinsicM function was defined as:

def get_rel_extrinsicM(ext_ref, ext_src):
    ''' Get the extrinisc matrix from ref_view to src_view '''
    return ext_src.dot( np.linalg.inv( ext_ref))

I don't understand why you compute the transformation matrix from ref_view to src_view. From my understanding of your paper, we need a transformation from src_view to ref_view. Then, we can compute the cost volume between the reference image and the warped image. Can you clarify this question for me?

cxlcl commented 4 years ago

Yes, your understanding for the cost volume is correct. But we still need the transformation from the ref. view to the src. view so we can do the 3D re-sampling for the prediction step: p(dt) -> p(d{t+1})

TruongKhang commented 4 years ago

@cxlcl , I agree. But at line 271 of the file batchloader.py:

src_cam_pose_ = [ warp_homo.get_rel_extrinsicM(ref_dat_['extM'], src_cam_extM_) for src_cam_extM_ in src_cam_extMs ]

This command is to get the relative transformation matrix from ref_view to src_view. When I look into your code of building cost volumne after feature extraction from D-Net, you use function est_swp_volume_v4 in file homography.py, right? Basically, this function implemented this formula:

warped_src_at_depth_d = K*R*P_ref_cuda * d + K*t

But the rotation R and translation t are still from ref_view to src_view. So this function can not warp the src_image to ref_image. That's what I'm understanding about your code now. Am I missing something?

cxlcl commented 4 years ago

I think what you might missed is that in order to get the warped src. view to the ref. view, we should: (1) for the gird pixel locations in the ref view, calculate their corresponding locations in the src view. (2) do interpolation in the src view: https://github.com/NVlabs/neuralrgbd/blob/c8071a0bcbd4c4e7ef95c44e7de9c51353ab9764/code/warping/homography.py#L447

See the discussion in slide 12-14 here

TruongKhang commented 4 years ago

thank you so much @cxlcl , I finally understood about it.