daniilidis-group / mvsec

Multi Vehicle Stereo Event Camera Dataset
MIT License
53 stars 10 forks source link

Alignment of image and depth map #7

Open kirkscheper opened 4 years ago

kirkscheper commented 4 years ago

This is more of a question than an issue.

I was taking a look at the ground truth optical flow but noticed that it doesn't quite lineup with the images (or event frames) from the DAVIS.

I tried with the h5py ground truth datasets, data from the precomputed ground truth in the npz files and ground truth computed using the script in this repo all with no luck. I also tried rectifying the images using the calibration data from the yaml info.

The image below is the overlay of the ground truth from the h5py with an undistorted image. raw_overlay

This shows the overlay of an undistorted image with the ground ruth computed by this repo. undistorted_rectified_overlay

I noticed that the dist_rect images are simply the dist_raw images passed through the same calibration pipeline as the left image which would suggest that the dist_raw should be from the same view point as the left image but they do not lineup.

It seems I am just missing something. Could you detail to me (and in the documentation) how I can align these frames correctly? Any help is greatly appreciated.

alexzzhu commented 4 years ago

Thanks for the question! I'm not sure if I completely understand the exact terms here, so I'd like to clarify the terms I'm familiar with. There is an undistorted image, which is generated by applying the lens model to the distorted image so that the lens projection equation holds. There is then the rectified image, which applies an additional 3D rotation and potential scaling to the image, such that the horizontal stereo assumption holds.

For the ground truth, I believe that the flow was computed for the rectified images, although it has been a while and I might be mistaken. In the images you are showing, are these the rectified images, or the undistorted ones?

kirkscheper commented 4 years ago

Apologies, I just noticed a typo in my original comment.

The first image is an overlay using the raw image and the precomputed ground truth provided in the hdf5 file which appears to have been made with an older version of the code on this repo as it is seems to be distorted (straight lines not straight), the previous version of the code doesn't appear to perform any rectification (please correct me if I am wrong). Here the raw depth map from the rosbag and the optical flow line up exactly so there appears to be no view point change during the ground truth generation.

The second image is an overlay using an undistorted and rectified image using the calibration parameters in the dataset (and cv2.fisheye functions) and the ground truth computed with the current master of this repo, which appears to use the pre-rectified depth maps from the Ros bag and the projection matrix of the left camera.

The depth maps in the rosbag look like they are generated for a slightly different view point than the left camera, it look like I am just missing a small correction.

For reference, I forked the repo and added a branch which shows what I am doing: https://github.com/kirkscheper/mvsec/blob/4b7199d88c9110e0d0353a889138e6b5e854d0ba/tools/gt_flow/compute_flow.py#L356

alexzzhu commented 4 years ago

Hmm could you share the separate ground truth depth and grayscale images? Also, does this occur throughout the entire bag? It might also just be that there was some smearing in the global map for this time instance.

kirkscheper commented 4 years ago

Sorry for the delayed response.

As far I can can see this offset is persistent throughout the datasets. It seems to be dependent on the depth and on the position in the scene which is why I think it is some kind of view point issue (the projected view point of the depthmap/ground truth flow is not the same as the left camera) or a calibration issue (or I am just making an error somewhere along the way).

The documentation states that the lidar frames should be from the viewpoint of the left image but does the T_cam0_lidar from the camchain-imucam need to be applied somehow?

See the attached for some images of the ground truth optical flow, raw images and rectified images for the sequence number 1 of the indoor scene.

imgs_sample.zip

As you can imagine, as the DVS only perceives contrast changes, so the optical flow can only be accurately estimated at locations with significant/high contrast i.e. edges so having a correctly aligned ground truth is very important.

Thanks in advance for any help you can give me.

alexzzhu commented 4 years ago

It's expected if you see that the depth map extends beyond the events (e.g. the objects appear fatter in the depth map). This is because of the way we generate the local map, which may be liable to have errors in the pose. However, this extension is typically on the order of only a few pixels, and there are usually no events right beyond the boundaries of each object. Is this what you're seeing? It might be easier to visualize if you could generate a video of this to see if the effect is consistent over time.

kirkscheper commented 4 years ago

You can find a video of the flight here: https://youtu.be/73iEJfZGGmw

kirkscheper commented 4 years ago

And here's one with the overlay (to make it easier to see): https://youtu.be/dRWscigkEGg

alexzzhu commented 4 years ago

This seems like it's to be expected unfortunately. As we accumulate depth points over multiple frames, there are errors introduced from the odometry. The typical effect is that objects in the depth/flow appear inflated compared to the original versions in the image/event space. Thresholding errors over only points with events should alleviate these issues somewhat as the points immediately beyond an object usually do not contain events.

JiahangWu commented 3 weeks ago

Sorry for the delayed response.

As far I can can see this offset is persistent throughout the datasets. It seems to be dependent on the depth and on the position in the scene which is why I think it is some kind of view point issue (the projected view point of the depthmap/ground truth flow is not the same as the left camera) or a calibration issue (or I am just making an error somewhere along the way).

The documentation states that the lidar frames should be from the viewpoint of the left image but does the T_cam0_lidar from the camchain-imucam need to be applied somehow?

See the attached for some images of the ground truth optical flow, raw images and rectified images for the sequence number 1 of the indoor scene.

imgs_sample.zip

As you can imagine, as the DVS only perceives contrast changes, so the optical flow can only be accurately estimated at locations with significant/high contrast i.e. edges so having a correctly aligned ground truth is very important.

Thanks in advance for any help you can give me.

Hi @kirkscheper, when I tried to overlap the event and depth map, I also found that they are not aligned. So did you solve this problem?