Megvii-BaseDetection / BEVDepth

Official code for BEVDepth.
MIT License
710 stars 98 forks source link

How to deal with camera pixels without lidar point? #175

Open klong121 opened 1 year ago

klong121 commented 1 year ago

Thanks for your great work! Your work has been very inspiring to me!

I have a little confusion. For example, in the following figure, the camera pixel indicated by the arrow cannot match any lidar point. In this case, how to provide depth supervision? Another question is: if a camera pixel cannot match any lidar, how to get the "ground truth" in Table 1 in your paper "BEVDepth"? image

Thanks!

sidiangongyuan commented 1 year ago

For the second question, all the camera pixel will have a ground truth. In build dataset process, there have an operation :depth_map = torch.zeros(resize_dims) and in the train step, gt_depths_tmp = torch.where(lidar_depth == 0.0, lidar_depth.max(), lidar_depth) just padding.

Wolfybox commented 5 months ago

For the second question, all the camera pixel will have a ground truth. In build dataset process, there have an operation :depth_map = torch.zeros(resize_dims) and in the train step, gt_depths_tmp = torch.where(lidar_depth == 0.0, lidar_depth.max(), lidar_depth) just padding.

lidar points are quite sparse, resulting in sparse depth map, especially after min-pooling, most of the grid cell contains zero or padding val. A depth completion on the generated gt depth map might help better the gt quality.