LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
7.01k stars 539 forks source link

Color Normalization #51

Closed choyingw closed 7 months ago

choyingw commented 9 months ago

I tried both the foundation model and the finetuned metric depth model. I found a slight difference where foundation models are trained and used to infer with color normalization applied to input images, but fine-tuned metric depth models are tuned/ used to infer without color normalization applied (i.e., only divide pixel values by 255.0). Why there is such difference?

Thanks for help!

LiheYoung commented 9 months ago

For our foundation models, we normalize the images with ImageNet mean/std because the pre-trained encoder also normalize in this manner. As for the fine-tuned models, honestly, I did not specially take care of the mean and std. According to my experiments, both ImageNet mean/std and the 0.5 mean/std (by MiDaS and ZoeDepth) work well. The only important thing is using the same mean/std across training and inference.