Open hgolestaniii opened 6 months ago
I have the same question. I hope to maintain the original size of KITTI as much as possible, but I found that the image will be resized before entering Midas. What size should I set for KITTI during metric depth inference?
The SW resizes your input KITTI images into 392x518 and then feeds them into the model for creating "pred". If you use a different size image, it resize it to 392x518 anyway.
Hi, @hgolestaniii, I encountered the same problem as you. It seems like the wrong config file was used. The correct one should be Depth-Anything/metric_depth/zoedepth/models/zoedepth/config_zoedepth_kitti.json. Can you achieve the results presented in the paper by resizing the images to 392x518 for evaluation? Does resizing the images to different sizes have a big impact on the outcome?
I have not tried to reach the results in the paper, however, I ran some experiments by resizing input images and then feeding them into the network (the code always does additional internal resizing to 392x518 as it's the native resolution of the encoder). The output metric depth values differ in the range of a few meters.
My conclusion: cropping and resizing DO HAVE impact on the final depth values.
Hi @LiheYoung,
As you know, resizing input images has a big impact on the metric depths. As far a I know, in ZoeDepth and your algorithm, we resize input images in order to match the training resolution and the patch size of the encoder. For example, in ZoeDepth, this is said:
In your paper, it's written that
Q1: Does it mean, for outdoor metric depth estimation evaluation or inference, we should resize input images to 384x768 before feeding them into the network? Your code does it for us, right?
Q2: In the paper, you evaluated the "metric outdoor" model on kitti, vkitti2, and diode_outdoor. When I try them (evaluate.py), I see the software resize them into "392x518" instead of "384x768". Is there any mismatch between code and paper?