facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.74k stars 751 forks source link

Unable to replicate downstream depth result on KITTI #227

Open zshn25 opened 11 months ago

zshn25 commented 11 months ago

I evaluate the pretrained dinov2 with different decoder heads on KITTI Eigen Split in order to replicate the paper's numbers. I found the results much worse.

Here's what I did. I load the models as shown in the notebook. I load the appropriate KITTI weights and check on an example KITTI image. Looks good. I modified this evaluation script to not convert disparity to depth and to not scale the output and ran the numbers.

example_input kitti
model abs_rel sq_rel rmse rmse_log a1 a2 a3
small+dpt 0.378 2.788 7.372 0.336 0.218 0.866 0.983
base+dpt 0.392 3.085 7.963 0.345 0.179 0.852 0.986
large+dpt 0.276 1.938 6.378 0.267 0.536 0.927 0.991

The reported RMSE is 2.34, 2.23, 2.14 for the small, base and large models with DPT respectively. Am I missing something?

zshn25 commented 11 months ago

I realized I was missing the input * 255 and then normalization transform. Now I'm able to replicate the paper results

zshn25 commented 11 months ago

I was too fast in my previous comment. I wasn't actually able to replicate the result. I now have

model abs_rel sq_rel rmse rmse_log a1 a2 a3
small+dpt 0.309 2.213 6.970 0.303 0.403 0.910 0.983
TheoMoutakanni commented 11 months ago

Hello @zshn25 , We are using this repository to evaluate our models: https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/blob/main/depth/datasets/kitti.py

You can look at the functions pre_eval and evaluate that are used in https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/blob/main/depth/apis/test.py#L213C25-L213C41

There are some subtleties that can change the results by a lot, did you look at the range of the predictions and the ground truth just before computing the metrics to be sure that they are approximately the same ? (between 0 and 80 in the case of KITTI if I remember correctly). Maybe plotting an histogram of both can help with that to understand which scaling are to be removed/needed. We were using the Eigen crop btw.

Feel free to continue the discussion, I will stay available.

zshn25 commented 11 months ago

Thanks for the reply. I’ve checked the predictions. The are in similar range as the GT (0.1 - 80). The eval code I used even scales the predictions to have similar range at GT and still the results were very different. I also use Eigen crop. I will now try to use the same eval library as you mentioned and report back.