LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.73k stars 517 forks source link

Clarification on zero-shot relative depth evaluation #174

Open manuelknott opened 3 months ago

manuelknott commented 3 months ago

Hi

It is unclear to me how you calculate $\delta_1$ and $AbsRel$ metrics in the zero-shot relative depth setting.

Are you using the normalized disparity maps $d$, the affine-invariant disparity maps $\hat{d}$, or are you converting back to depth maps for evaluation?

Thanks in advance.

LiheYoung commented 3 months ago

Hi, we evaluate the model prediction in the disparity space, without converting it back to the depth space.

manuelknott commented 3 months ago

Thank you for your reply! So just to be sure: I assume you use $d$ for metric calculation while $\hat{d}$ is only used for the labeled loss, correct?

Could you also comment on whether you scale model predictions to 0~1 or take them as they are (presumably coming from a final sigmoid activation) before comparing to groundtruth disparity maps?

Thank you.

LiheYoung commented 3 months ago

During evaluation, given a groundtruth depth map, we first convert it to disparity space to obtain $d{gt}$. Then we perform alignment between our raw predicted disparity map $d{pred}$ and $d_{gt}$ by fitting the scale and shift. We do not first scale our predictions to 0-1 during evaluation.

manuelknott commented 3 months ago

Thank you for clarification.

manuelknott commented 3 months ago

Dear @LiheYoung ,

I tried to reproduce the reported $\delta_1$ and $AbsRel$ metrics using one of your checkpoints. As there is no dedicated evaluation code for relative depth, I used the code in metric_depth/evaluate.py. I tried different variations: A) Scale the model output to 0~1 B) Normalize the model output to $\hat{\delta}$ as described in the "Labelled Loss" section in your paper (which is what I guess you suggested) C) Convert disparity back to relative depth (even though you mentioned that you are not doing that for evaluation).

Of course I masked out all invalid regions.

The averaged $\delta_1$ scores I get for nyu test set are: 0.58 (A) 0.60 (B) 0.84 (C). These are far off from your reported metrics. Therefore, I am clearly missing something. Is there a way that you could provide evaluation code for zero-shot relative depth or elaborate on the procedure?

Thank you very much.

bimsarapathiraja commented 3 months ago

@LiheYoung Could you please provide more information or the code for reproducing the metric values AbsRel mention in the paper?

lambertwx commented 2 months ago

I second this request. Having the code that you used to perform the evaluation would clear up a lot of questions.

@LiheYoung Could you please provide more information or the code for reproducing the metric values AbsRel mention in the paper?

Brummi commented 1 month ago

I am also not able to reproduce the reported numbers given the instructions from above. Could the authors please clarify the exact evaluation procedure and ideally provide code for their evaluations.