Brummi / BehindTheScenes

Official implementation of the paper: Behind the Scenes: Density Fields for Single View Reconstruction (CVPR 2023)
https://fwmb.github.io/bts/
BSD 2-Clause "Simplified" License
250 stars 19 forks source link

question about visualizing ground truth depth #13

Closed Torment123 closed 1 year ago

Torment123 commented 1 year ago

Hi, I'm currently doing test on the KITTI-360 dataset and trying to visualize the ground truth depth map of the datasample (returned output from the load_depth function of Kitti360Dataset). However it doesn't show the proper depth profile of the corresponding scene image, as shown below: real image: download

visualized gt_depth map: download (1)

Are there any additional processings I need to do with the depth maps in order to visualize it properly? Thanks

Brummi commented 1 year ago

Hi! When I do the same, everything works without issues for me. Could you give some more details on the dataset configuration and the script you used to generate these visualizations? Best, Felix

Torment123 commented 1 year ago

Hi, thanks for your fast response, below is the code I used to visualize:

import matplotlib.pyplot as plt

train_dataset = Kitti360Dataset( data_path="/home/jshen27/data/KITTI-360", pose_path="/home/jshen27/data/KITTI-360/data_poses", split_path=None, target_image_size=tuple((192, 640)), frame_count=2, return_depth=True, return_stereo=True, return_fisheye=False, return_3d_bboxes=False, return_segmentation=False, keyframe_offset=0, dilation=1, fisheye_rotation=0, fisheye_offset=1, color_aug=False, is_preprocessed=False )

out = train_dataset[1] # select a random sample

plt.imshow(out["imgs"][0].permute(1, 2, 0)) # visualize the first image in the sample

plt.imshow(out['depths'][0][0]) # visualize the depth map in the sample

I'd be appreciated if you can help me point out where I did wrong, thanks

Brummi commented 1 year ago

Could you try using one of the provided split files? Maybe the split causes the misalignment between lidar and image data.

Torment123 commented 1 year ago

Thanks for your reply. I have figured out that because the gt depth values of KITTI360 is very sparse, I cannot see complete object profile by merely visualizing the GT.

Currently I'm trying to train the BTS model under stereo setting (2 rectified left/right images), and compares its performance with SOTA disparity prediction result like ACV. And my new question is since there is no GT disparity available for kitti360, what I do is transfer the GT depth value based on the equation: disparity = focal_length x baseline / depth (1) to get the GT disparity, and use it to compute the disparity metric (EPE) of a pretrained ACV net and trained BTS. I wonder if this is the correct processing to get the accurate disparity GT of KITTI360? because although the visualization of the predicted disparity map shows the correct object profile, but the EPE computed by the aforementioned way is very large (around 20), so I wonder if I missed any intermediate processing step in terms of scaling, thanks.

Brummi commented 1 year ago

Hi! sorry for the late reply! Regarding the disparity: I would advice to project the points into 3D and then backproject them into the stereo frame based on the depth. Note that by default, the network returns the L2 norm of the ray (ie the distance) as depth rather than the z value, which is used in depth prediction networks.

utils/projection_operations.py provided the distance_to_z function which converts the output of the NeRF renderer to proper depth maps which can be compared.