LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.73k stars 516 forks source link

Optimal image size and aspect ratio for metric depth estimation #54

Open eldar opened 7 months ago

eldar commented 7 months ago

Thank you all for the really awesome code and research! I am interested in metric depth estimation on KITTI. The metric_depth/evaluate.py script drastically changes aspect ratio of input images of KITTI dataset, such that output depth map is 518x392 pixels, whereas input images are 1216x352, this is enabled by passing mode="eval" to get_config() function call. Evaluating the model with changed aspect ration results in higher accuracy:

{'a1': 0.987, 'a2': 0.998, 'a3': 0.999, 'abs_rel': 0.04, 'rmse': 1.824, 'log_10': 0.017, 'rmse_log': 0.062, 'silog': 5.806, 'sq_rel': 0.107}

compared to running depth estimation on full resolution and original aspect ratio (mode="infer"):

{'a1': 0.982, 'a2': 0.997, 'a3': 0.999, 'abs_rel': 0.056, 'rmse': 2.153, 'log_10': 0.025, 'rmse_log': 0.078, 'silog': 6.881, 'sq_rel': 0.152}

I wanted to confirm what is the optimal image size and aspect ratio for metric depth estimation.

Thanks, Eldar.

LiheYoung commented 7 months ago

Hi, we did not fine-tune the optimal aspect ratio for fair comparison. I also noticed that some other aspect ratios will result in better results. It is hard to say which aspect ratio or image resolution is optimal. It may depend on specific dataset.