ShreyasSkandanS / stereo_sparse_depth_fusion

ICRA 2019 | Repository for "Real Time Dense Depth Estimation by Fusing Stereo with Sparse Depth Measurements" | OpenCV, C++
GNU General Public License v3.0
167 stars 42 forks source link

Creation of gt_disparity.png image #1

Closed dattadebrup closed 5 years ago

dattadebrup commented 5 years ago

Hi, Can you please share the script which you used to produce the gt_disparity.png image? I was trying to produce that gt_disparity.png file using my own script but it is producing very bad fusion images. Thanks in advance.

ShreyasSkandanS commented 5 years ago

Hi @dattadebrup,

gt_disparity.png is the ground truth image from the KITTI Stereo 2015 dataset (http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo). You shouldn't need to generate this image. Can you provide more details about the code you're trying to run?

Best regards, SS

dattadebrup commented 5 years ago

Hi @ShreyasSkandanS , Thanks for the response .I am sorry for not explaining the problem in details. Actually I am trying to implement your fusion algorithm on the Raw data of KITTI (http://www.cvlibs.net/datasets/kitti/raw_data.php). So I am trying to create the gt_disparity.png image

Additionally, using the calibrated intrinsics and extrinsics, we convert the depth sensor’s range measurements into a depth image in the left camera’s reference frame with matching focal length.

using a script . Also , I have tried my best to shift the depth values of the depth image produced from the 3D LIDAR pointcloud to be in consistent with the stereo-only depth image. Still the resultant fusion images produced are bad. So how should I create the gt_disparity.png image properly from the 3D LIDAR pointcloud data?

ShreyasSkandanS commented 5 years ago

Hi @dattadebrup, creation of ground truth LiDAR points from the raw KITTI data is outside the scope of this paper / repository but I will try address some of your concerns either way.

  1. The ground truth KITTI data was generated by accumulating a sequence of individual LiDAR scans and then filtering this accumulated set of depth points. The LiDAR points are then projected onto the respective camera frame. The filtering step is not mentioned in extreme detail but I believe the gist of it is that the accumulated points are compared with stereo depth from Semi-Global Matching and any points that disagree heavily are discarded. The authors also mention a fair amount of hand-intervened cleaning up of the ground truth data.

  2. I will assume that the script that you have designed handles the projection of N different, sequential LiDAR frames onto a single camera frame. And I'm assuming that your seeing a lot of noise in the resulting "ground truth" image. And yes, this can be expected - this is due to mild mis-calibrations and partial/half occlusions etc. which is why the authors post-process the resulting image.

In case you haven't read these papers, I would recommend the following: 1. Are we ready for Autonomous Driving?The KITTI Vision Benchmark Suite - paper

To obtain a high stereo and optical flow ground truth density, we register a set of consecutive frames (5 before and 5 after the frame of interest) using ICP. We project the accumulated point clouds onto the image and automatically remove points falling outside the image. We then manually remove all ambiguous image regions such as windows and fences. Given the camera calibration, the corresponding disparity maps are readily computed.

2. Object Scene Flow for Autonomous Vehicles - paper

In absence of appropriate public datasets we annotated400 dynamic scenes from the KITTI raw dataset with optical flow and disparity ground truth in two consecutive frames. The process of ground truth generation is especially challenging in the presence of individually moving objects since they cannot be easily recovered from laser scanner data alone due to the rolling shutter of the Velodyne and the low frame rate (10 fps). Our annotation work-flow consists of two major steps: First, we recover the static background of the scene by removing all dynamic objects and compensating for the vehicle’s ego motion. Second, we re-insert the dynamic objects by fitting detailed CAD models to the point clouds in each frame.

  1. More importantly, for this paper and repository I assume ground truth data already exists and that is the case for both KITTI and Middlebury. For the PMD Monstar dataset in this paper, I provide only qualitative results and do not have ground truth present.

  2. If you're interested in pre-processed KITTI Raw data, I would also take a look at the KITTI Depth Completion benchmark data - Depth Completion

I'm closing this issue since it isn't relevant to this repository or paper, but feel free to raise another issue if you have any questions related to this work.