Image resize during inference/training

hgolestaniii commented 6 months ago

Hi @LiheYoung,

As you know, resizing input images has a big impact on the metric depths. As far a I know, in ZoeDepth and your algorithm, we resize input images in order to match the training resolution and the patch size of the encoder. For example, in ZoeDepth, this is said:

All ZoeDepth architectures and prior works are evaluated by resizing the input to the training resolution. Zoe- -N, Zoe--NK, and Zoe-*-K models are trained at resolutions 384 × 512, 384 × 512 and 384 × 768 respectively.

In your paper, it's written that

Concretely, the training resolution is 392×518 on NYUv2 [54] and 384×768 on KITTI [18] to match the patch size of our encoder.

Q1: Does it mean, for outdoor metric depth estimation evaluation or inference, we should resize input images to 384x768 before feeding them into the network? Your code does it for us, right?

Q2: In the paper, you evaluated the "metric outdoor" model on kitti, vkitti2, and diode_outdoor. When I try them (evaluate.py), I see the software resize them into "392x518" instead of "384x768". Is there any mismatch between code and paper?

tlxhlll commented 6 months ago

I have the same question. I hope to maintain the original size of KITTI as much as possible, but I found that the image will be resized before entering Midas. What size should I set for KITTI during metric depth inference?

hgolestaniii commented 6 months ago

The SW resizes your input KITTI images into 392x518 and then feeds them into the model for creating "pred". If you use a different size image, it resize it to 392x518 anyway.

star9988rr commented 5 months ago

Hi, @hgolestaniii, I encountered the same problem as you. It seems like the wrong config file was used. The correct one should be Depth-Anything/metric_depth/zoedepth/models/zoedepth/config_zoedepth_kitti.json. Can you achieve the results presented in the paper by resizing the images to 392x518 for evaluation? Does resizing the images to different sizes have a big impact on the outcome?

hgolestaniii commented 5 months ago

I have not tried to reach the results in the paper, however, I ran some experiments by resizing input images and then feeding them into the network (the code always does additional internal resizing to 392x518 as it's the native resolution of the encoder). The output metric depth values differ in the range of a few meters.

My conclusion: cropping and resizing DO HAVE impact on the final depth values.

LiheYoung / Depth-Anything

Image resize during inference/training #102