TOPO-EPFL / CrossLoc-Benchmark-Datasets

[CVPR'22] CrossLoc benchmark datasets setup and helper scripts.
MIT License
11 stars 1 forks source link

About the depth label of real images #4

Closed tdd233 closed 1 year ago

tdd233 commented 1 year ago

Hi Yan Qi,

Thanks for your outstanding work. I wonder that how did you get the depth label of real images and how accurate they are. Could you please tell me the general process? Looking forward to your reply.

Yours sincerely, Tu.

qiyan98 commented 1 year ago

Hi Tu,

Thanks for your interest in our work. Here are the key steps to obtain depth or other 3D labels for real images.

During data collection, we used a DJI drone equipped with a high-precision real-time kinematics (RTK) kit to capture images and then extracted the geotags of the captured photos, including longitude, latitude, and height. After this step, we have both camera extrinsic and intrinsic parameters for each real image. We aimed to acquire high-quality data by: 1) using RTK drones known for millimeter-level accuracy; 2) validating the geotags through photogrammetry reconstruction, such as bundle adjustment.

Next, we imported the locations into the Cesium environment along with 3D assets. Thanks to the high-quality 3D assets provided by swisstopo, we could extract WGS84 world coordinates per pixel, establishing pixel-wise correspondence to the real images. After this step, we obtain world coordinates through ray-tracing for each real image. We validate the accuracy of coordinate extraction by computing the reprojection error from 3D to 2D space. For detailed information about data quality control, please refer to Appendix A in our paper.

Finally, we can extract depth based on the pinhole camera model, given the camera extrinsic and intrinsic parameters and world coordinates per pixel in the image. See here for the code.

Best, Qi