LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.49k stars 500 forks source link

Depth-Anything Small encoder for metric depth estimation? #86

Open Denny-kef opened 5 months ago

Denny-kef commented 5 months ago

What steps would you recommend taking for using the small depth anything relative depth estimation model with the metric depth estimation pipeline? Will it need to be re-trained or should I be able to swap them and it still work somewhat correctly?

mvish7 commented 5 months ago

I finetuned a metric depth model on custom dataset. From my tinkering with depth anything codebase:

To switch encoder: You need to modify build function as shown below:

https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/models/base_models/depth_anything.py#L334

Replace depth_anything = DPT_DINOv2(encoder='vitl', out_channels=[256, 512, 1024, 1024], use_clstoken=False) By depth_anything = DPT_DINOv2(encoder='vits', features=64, out_channels=[48, 96, 192, 384], use_clstoken=False)

On the dataset side:

You need to accommodate your dataset to follow the preprocessing and augmentation done in https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/data/data_mono.py#L292

If your depth GT needs custom scaling as shown below then apply such scaling and you should be good to go. https://github.com/LiheYoung/Depth-Anything/blob/e7ef4b4b7a0afd8a05ce9564f04c1e5b68268516/metric_depth/zoedepth/data/data_mono.py#L353

Denny-kef commented 5 months ago

@mvish7 Can you just swap out the encoder and not finetune the model and it still work?

mvish7 commented 5 months ago

Hi As the base DepthAnything model is trained for relative depth, just swapping the encoder won't produce metric depth. From my experience within 3 epochs of fine tuning the model has learned the depth scale of our custom dataset.

cosmosmosco commented 4 months ago

@mvish7 Hi, I want to ask how many pairs of rgb-depth have you used for fintuning. I have used 100 pairs of my own data and trained 50 epochs, but the result seems not accurate. I also want to know if the parameter --pretrained_resource="" in train_mono.py is depth_anything_metric_depth_outdoor.pt? If I use depth_anything_vitl14.pth, there will be a mismatch error in state_dict.

LiheYoung commented 4 months ago

Hi @cosmosmosco, the argument --pretrained_resource="" is for loading a pre-trained checkpoint (containing entire model parameters). Please just set it as an empty string when launching your training script.

cosmosmosco commented 4 months ago

Hi @cosmosmosco, the argument --pretrained_resource="" is for loading a pre-trained checkpoint (containing entire model parameters). Please just set it as an empty string when launching your training script.

Thank you. It really helps.

abhishek0696 commented 3 months ago

@mvish7 @cosmosmosco I am trying to finetune the zoedepth+depthanything model on a custom outdoor dataset for which I do have the pixel-wise gt. Can you please throw some light on data preparation, which config to use, and how best to use the script along with the changes in the script and the specific arguments for a custom dataset? Thanks in advance!