Closed Torment123 closed 1 year ago
Hi! When I do the same, everything works without issues for me. Could you give some more details on the dataset configuration and the script you used to generate these visualizations? Best, Felix
Hi, thanks for your fast response, below is the code I used to visualize:
import matplotlib.pyplot as plt
train_dataset = Kitti360Dataset( data_path="/home/jshen27/data/KITTI-360", pose_path="/home/jshen27/data/KITTI-360/data_poses", split_path=None, target_image_size=tuple((192, 640)), frame_count=2, return_depth=True, return_stereo=True, return_fisheye=False, return_3d_bboxes=False, return_segmentation=False, keyframe_offset=0, dilation=1, fisheye_rotation=0, fisheye_offset=1, color_aug=False, is_preprocessed=False )
out = train_dataset[1] # select a random sample
plt.imshow(out["imgs"][0].permute(1, 2, 0)) # visualize the first image in the sample
plt.imshow(out['depths'][0][0]) # visualize the depth map in the sample
I'd be appreciated if you can help me point out where I did wrong, thanks
Could you try using one of the provided split files? Maybe the split causes the misalignment between lidar and image data.
Thanks for your reply. I have figured out that because the gt depth values of KITTI360 is very sparse, I cannot see complete object profile by merely visualizing the GT.
Currently I'm trying to train the BTS model under stereo setting (2 rectified left/right images), and compares its performance with SOTA disparity prediction result like ACV. And my new question is since there is no GT disparity available for kitti360, what I do is transfer the GT depth value based on the equation: disparity = focal_length x baseline / depth (1) to get the GT disparity, and use it to compute the disparity metric (EPE) of a pretrained ACV net and trained BTS. I wonder if this is the correct processing to get the accurate disparity GT of KITTI360? because although the visualization of the predicted disparity map shows the correct object profile, but the EPE computed by the aforementioned way is very large (around 20), so I wonder if I missed any intermediate processing step in terms of scaling, thanks.
Hi! sorry for the late reply! Regarding the disparity: I would advice to project the points into 3D and then backproject them into the stereo frame based on the depth. Note that by default, the network returns the L2 norm of the ray (ie the distance) as depth rather than the z value, which is used in depth prediction networks.
utils/projection_operations.py
provided the distance_to_z
function which converts the output of the NeRF renderer to proper depth maps which can be compared.
Hi, I'm currently doing test on the KITTI-360 dataset and trying to visualize the ground truth depth map of the datasample (returned output from the load_depth function of Kitti360Dataset). However it doesn't show the proper depth profile of the corresponding scene image, as shown below: real image:
visualized gt_depth map:
Are there any additional processings I need to do with the depth maps in order to visualize it properly? Thanks