Unable to replicate downstream depth result on KITTI

facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.

Apache License 2.0

8.74k stars 751 forks source link

Unable to replicate downstream depth result on KITTI #227

Open zshn25 opened 11 months ago

zshn25 commented 11 months ago

I evaluate the pretrained dinov2 with different decoder heads on KITTI Eigen Split in order to replicate the paper's numbers. I found the results much worse.

Here's what I did. I load the models as shown in the notebook. I load the appropriate KITTI weights and check on an example KITTI image. Looks good. I modified this evaluation script to not convert disparity to depth and to not scale the output and ran the numbers.

model	abs_rel	sq_rel	rmse	rmse_log	a1	a2	a3
small+dpt	0.378	2.788	7.372	0.336	0.218	0.866	0.983
base+dpt	0.392	3.085	7.963	0.345	0.179	0.852	0.986
large+dpt	0.276	1.938	6.378	0.267	0.536	0.927	0.991

The reported RMSE is 2.34, 2.23, 2.14 for the small, base and large models with DPT respectively. Am I missing something?

zshn25 commented 11 months ago

I realized I was missing the input * 255 and then normalization transform. Now I'm able to replicate the paper results

zshn25 commented 11 months ago

I was too fast in my previous comment. I wasn't actually able to replicate the result. I now have

model	abs_rel	sq_rel	rmse	rmse_log	a1	a2	a3
small+dpt	0.309	2.213	6.970	0.303	0.403	0.910	0.983

TheoMoutakanni commented 11 months ago

Hello @zshn25 , We are using this repository to evaluate our models: https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/blob/main/depth/datasets/kitti.py

You can look at the functions pre_eval and evaluate that are used in https://github.com/zhyever/Monocular-Depth-Estimation-Toolbox/blob/main/depth/apis/test.py#L213C25-L213C41

There are some subtleties that can change the results by a lot, did you look at the range of the predictions and the ground truth just before computing the metrics to be sure that they are approximately the same ? (between 0 and 80 in the case of KITTI if I remember correctly). Maybe plotting an histogram of both can help with that to understand which scaling are to be removed/needed. We were using the Eigen crop btw.

Feel free to continue the discussion, I will stay available.

zshn25 commented 11 months ago

Thanks for the reply. I’ve checked the predictions. The are in similar range as the GT (0.1 - 80). The eval code I used even scales the predictions to have similar range at GT and still the results were very different. I also use Eigen crop. I will now try to use the same eval library as you mentioned and report back.