LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.83k stars 523 forks source link

I trained my dataset with metric_depth and obtained the model file. How can I use my model to predict image depth #149

Open HaosenZ opened 5 months ago

HaosenZ commented 5 months ago

I trained my dataset with metric_depth and obtained the model file. How can I use my model to predict image depth.Currently, I have modified the model loading method in run.py, but an error message appears indicating that my model is incorrect.The following is the error message: RuntimeError: Error(s) in loading state_dict for DepthAnything: Missing key(s) in state_dict: "pretrained.cls_token", "pretrained.pos_embed", "pretrained.mask_token", "pretrained.patch_embed.proj.weight", "pretrained.patch_embed.proj.bias", "pretrained.blocks.0.norm1.weight", "pretrained.blocks.0.norm1.bias", "pretrained.blocks.0.attn.qkv.weight", "pretrained.blocks.0.attn.qkv.bias", "pretrained.blocks.0.attn.proj.weight", "pretrained.blocks.0.attn.proj.bias", "pretrained.blocks.0.ls1.gamma", "pretrained.blocks.0.norm2.weight", "pretrained.blocks.0.norm2.bias", "pretrained.blocks.0.mlp.fc1.weight", "pretrained.blocks.0.mlp.fc1.bias", "pretrained.blocks.0.mlp.fc2.weight", "pretrained.blocks.0.mlp.fc2.bias", "pretrained.blocks.0.ls2.gamma", "pretrained.blocks.1.norm1.weight", "pretrained.blocks.1.norm1.bias", "pretrained.blocks.1.attn.qkv.weight", "pretrained.blocks.1.attn.qkv.bias", "pretrained.blocks.1.attn.proj.weight", "pretrained.blocks.1.attn.proj.bias", "pretrained.blocks.1.ls1.gamma", "pretrained.blocks.1.norm2.weight", "pretrained.blocks.1.norm2.bias", "pretrained.blocks.1.mlp.fc1.weight", "pretrained.blocks.1.mlp.fc1.bias", "pretrained.blocks.1.mlp.fc2.weight", "pretrained.blocks.1.mlp.fc2.bias", "pretrained.blocks.1.ls2.gamma", "pretrained.blocks.2.norm1.weight", "pretrained.blocks.2.norm1.bias", "pretrained.blocks.2.attn.qkv.weight", "pretrained.blocks.2.attn.qkv.bias", "pretrained.blocks.2.attn.proj.weight", "pretrained.blocks.2.attn.proj.bias", "pretrained.blocks.2.ls1.gamma", "pretrained.blocks.2.norm2.weight", "pretrained.blocks.2.norm2.bias", "pretrained.blocks.2.mlp.fc1.weight", "pretrained.blocks.2.mlp.fc1.bias", "pretrained.blocks.2.mlp.fc2.weight", "pretrained.blocks.2.mlp.fc2.bias", "pretrained.blocks.2.ls2.gamma", "pretrained.blocks.3.norm1.weight", "pretrained.blocks.3.norm1.bias", "pretrained.blocks.3.attn.qkv.weight", "pretrained.blocks.3.attn.qkv.bias", "pretrained.blocks.3.attn.proj.weight", "pretrained.blocks.3.attn.proj.bias", "pretrained.blocks.3.ls1.gamma", "pretrained.blocks.3.norm2.weight", "pretrained.blocks.3.norm2.bias", "pretrained.blocks.3.mlp.fc1.weight", "pretrained.blocks.3.mlp.fc1.bias", "pretrained.blocks.3.mlp.fc2.weight", "pretrained.blocks.3.mlp.fc2.bias", "pretrained.blocks.3.ls2.gamma", "pretrained.blocks.4.norm1.weight", "pretrained.blocks.4.norm1.bias", "pretrained.blocks.4.attn.qkv.weight", "pretrained.blocks.4.attn.qkv.bias", "pretrained.blocks.4.attn.proj.weight", "pretrained.blocks.4.attn.proj.bias", "pretrained.blocks.4.ls1.gamma", "pretrained.blocks.4.norm2.weight", "pretrained.blocks.4.norm2.bias", "pretrained.blocks.4.mlp.fc1.weight", "pretrained.blocks.4.mlp.fc1.bias", "pretrained.blocks.4.mlp.fc2.weight", "pretrained.blocks.4.mlp.fc2.bias", "pretrained.blocks.4.ls2.gamma", "pretrained.blocks.5.norm1.weight", "pretrained.blocks.5.norm1.bias", "pretrained.blocks.5.attn.qkv.weight", "pretrained.blocks.5.attn.qkv.bias", "pretrained.blocks.5.attn.proj.weight", "pretrained.blocks.5.attn.proj.bias", "pretrained.blocks.5.ls1.gamma", "pretrained.blocks.5.norm2.weight", "pretrained.blocks.5.norm2.bias", "pretrained.blocks.5.mlp.fc1.weight", "pretrained.blocks.5.mlp.fc1.bias", "pretrained.blocks.5.mlp.fc2.weight", "pretrained.blocks.5.mlp.fc2.bias", "pretrained.blocks.5.ls2.gamma", "pretrained.blocks.6.norm1.weight", "pretrained.blocks.6.norm1.bias", "pretrained.blocks.6.attn.qkv.weight", "pretrained.blocks.6.attn.qkv.bias", "pretrained.blocks.6.attn.proj.weight", "pretrained.blocks.6.attn.proj.bias", "pretrained.blocks.6.ls1.gamma", "pretrained.blocks.6.norm2.weight", "pretrained.blocks.6.norm2.bias", "pretrained.blocks.6.mlp.fc1.weight", "pretrained.blocks.6.mlp.fc1.bias", "pretrained.blocks.6.mlp.fc2.weight", "pretrained.blocks.6.mlp.fc2.bias", "pretrained.blocks.6.ls2.gamma", "pretrained.blocks.7.norm1.weight", "pretrained.blocks.7.norm1.bias", "pretrained.blocks.7.attn.qkv.weight", "pretrained.blocks.7.attn.qkv.bias", "pretrained.blocks.7.attn.proj.weight", "pretrained.blocks.7.attn.proj.bias", "pretrained.blocks.7.ls1.gamma", "pretrained.blocks.7.norm2.weight", "pretrained.blocks.7.norm2.bias", "pretrained.blocks.7.mlp.fc1.weight", "pretrained.blocks.7.mlp.fc1.bias", "pretrained.blocks.7.mlp.fc2.weight", "pretrained.blocks.7.mlp.fc2.bias", "pretrained.blocks.7.ls2.gamma", "pretrained.blocks.8.norm1.weight", "pretrained.blocks.8.norm1.bias", "pretrained.blocks.8.attn.qkv.weight", "pretrained.blocks.8.attn.qkv.bias", "pretrained.blocks.8.attn.proj.weight", "pretrained.blocks.8.attn.proj.bias", "pretrained.blocks.8.ls1.gamma", "pretrained.blocks.8.norm2.weight", "pretrained.blocks.8.norm2.bias", "pretrained.blocks.8.mlp.fc1.weight", "pretrained.blocks.8.mlp.fc1.bias", "pretrained.blocks.8.mlp.fc2.weight", "pretrained.blocks.8.mlp.fc2.bias", "pretrained.blocks.8.ls2.gamma", "pretrained.blocks.9.norm1.weight", "pretrained.blocks.9.norm1.bias", "pretrained.blocks.9.attn.qkv.weight", "pretrained.blocks.9.attn.qkv.bias", "pretrained.blocks.9.attn.proj.weight", "pretrained.blocks.9.attn.proj.bias", "pretrained.blocks.9.ls1.gamma", "pretrained.blocks.9.norm2.weight", "pretrained.blocks.9.norm2.bias", "pretrained.blocks.9.mlp.fc1.weight", "pretrained.blocks.9.mlp.fc1.bias", "pretrained.blocks.9.mlp.fc2.weight", "pretrained.blocks.9.mlp.fc2.bias", "pretrained.blocks.9.ls2.gamma", "pretrained.blocks.10.norm1.weight", "pretrained.blocks.10.norm1.bias", "pretrained.blocks.10.attn.qkv.weight", "pretrained.blocks.10.attn.qkv.bias", "pretrained.blocks.10.attn.proj.weight", "pretrained.blocks.10.attn.proj.bias", "pretrained.blocks.10.ls1.gamma", "pretrained.blocks.10.norm2.weight", "pretrained.blocks.10.norm2.bias", "pretrained.blocks.10.mlp.fc1.weight", "pretrained.blocks.10.mlp.fc1.bias", "pretrained.blocks.10.mlp.fc2.weight", "pretrained.blocks.10.mlp.fc2.bias", "pretrained.blocks.10.ls2.gamma", "pretrained.blocks.11.norm1.weight", "pretrained.blocks.11.norm1.bias", "pretrained.blocks.11.attn.qkv.weight", "pretrained.blocks.11.attn.qkv.bias", "pretrained.blocks.11.attn.proj.weight", "pretrained.blocks.11.attn.proj.bias", "pretrained.blocks.11.ls1.gamma", "pretrained.blocks.11.norm2.weight", "pretrained.blocks.11.norm2.bias", "pretrained.blocks.11.mlp.fc1.weight", "pretrained.blocks.11.mlp.fc1.bias", "pretrained.blocks.11.mlp.fc2.weight", "pretrained.blocks.11.mlp.fc2.bias", "pretrained.blocks.11.ls2.gamma", "pretrained.blocks.12.norm1.weight", "pretrained.blocks.12.norm1.bias", "pretrained.blocks.12.attn.qkv.weight", "pretrained.blocks.12.attn.qkv.bias", "pretrained.blocks.12.attn.proj.weight", "pretrained.blocks.12.attn.proj.bias", "pretrained.blocks.12.ls1.gamma", "pretrained.blocks.12.norm2.weight", "pretrained.blocks.12.norm2.bias", "pretrained.blocks.12.mlp.fc1.weight", "pretrained.blocks.12.mlp.fc1.bias", "pretrained.blocks.12.mlp.fc2.weight", "pretrained.blocks.12.mlp.fc2.bias", "pretrained.blocks.12.ls2.gamma", "pretrained.blocks.13.norm1.weight", "pretrained.blocks.13.norm1.bias", "pretrained.blocks.13.attn.qkv.weight", "pretrained.blocks.13.attn.qkv.bias", "pretrained.blocks.13.attn.proj.weight", "pretrained.blocks.13.attn.proj.bias", "pretrained.blocks.13.ls1.gamma", "pretrained.blocks.13.norm2.weight", "pretrained.blocks.13.norm2.bias", "pretrained.blocks.13.mlp.fc1.weight", "pretrained.blocks.13.mlp.fc1.bias", "pretrained.blocks.13.mlp.fc2.weight", "pretrained.blocks.13.mlp.fc2.bias", "pretrained.blocks.13.ls2.gamma", "pretrained.blocks.14.norm1.weight", "pretrained.blocks.14.norm1.bias", "pretrained.blocks.14.attn.qkv.weight", "pretrained.blocks.14.attn.qkv.bias", "pretrained.blocks.14.attn.proj.weight", "pretrained.blocks.14.attn.proj.bias", "pretrained.blocks.14.ls1.gamma", "pretrained.blocks.14.norm2.weight", "pretrained.blocks.14.norm2.bias", "pretrained.blocks.14.mlp.fc1.weight", "pretrained.blocks.14.mlp.fc1.bias", "pretrained.blocks.14.mlp.fc2.weight", "pretrained.blocks.14.mlp.fc2.bias", "pretrained.blocks.14.ls2.gamma", "pretrained.blocks.15.norm1.weight", "pretrained.blocks.15.norm1.bias", "pretrained.blocks.15.attn.qkv.weight", "pretrained.blocks.15.attn.qkv.bias", "pretrained.blocks.15.attn.proj.weight", "pretrained.blocks.15.attn.proj.bias", "pretrained.blocks.15.ls1.gamma", "pretrained.blocks.15.norm2.weight", "pretrained.blocks.15.norm2.bias", "pretrained.blocks.15.mlp.fc1.weight", "pretrained.blocks.15.mlp.fc1.bias", "pretrained.blocks.15.mlp.fc2.weight", "pretrained.blocks.15.mlp.fc2.bias", "pretrained.blocks.15.ls2.gamma", "pretrained.blocks.16.norm1.weight", "pretrained.blocks.16.norm1.bias", "pretrained.blocks.16.attn.qkv.weight", "pretrained.blocks.16.attn.qkv.bias", "pretrained.blocks.16.attn.proj.weight", "pretrained.blocks.16.attn.proj.bias", "pretrained.blocks.16.ls1.gamma", "pretrained.blocks.16.norm2.weight", "pretrained.blocks.16.norm2.bias", "pretrained.blocks.16.mlp.fc1.weight", "pretrained.blocks.16.mlp.fc1.bias", "pretrained.blocks.16.mlp.fc2.weight", "pretrained.blocks.16.mlp.fc2.bias", "pretrained.blocks.16.ls2.gamma", "pretrained.blocks.17.norm1.weight", "pretrained.blocks.17.norm1.bias", "pretrained.blocks.17.attn.qkv.weight", "pretrained.blocks.17.attn.qkv.bias", "pretrained.blocks.17.attn.proj.weight", "pretrained.blocks.17.attn.proj.bias", "pretrained.blocks.17.ls1.gamma", "pretrained.blocks.17.norm2.weight", "pretrained.blocks.17.norm2.bias", "pretrained.blocks.17.mlp.fc1.weight", "pretrained.blocks.17.mlp.fc1.bias", "pretrained.blocks.17.mlp.fc2.weight", "pretrained.blocks.17.mlp.fc2.bias", "pretrained.blocks.17.ls2.gamma", "pretrained.blocks.18.norm1.weight", "pretrained.blocks.18.norm1.bias", "pretrained.blocks.18.attn.qkv.weight", "pretrained.blocks.18.attn.qkv.bias", "pretrained.blocks.18.attn.proj.weight", "pretrained.blocks.18.attn.proj.bias", "pretrained.blocks.18.ls1.gamma", "pretrained.blocks.18.norm2.weight", "pretrained.blocks.18.norm2.bias", "pretrained.blocks.18.mlp.fc1.weight", "pretrained.blocks.18.mlp.fc1.bias", "pretrained.blocks.18.mlp.fc2.weight", "pretrained.blocks.18.mlp.fc2.bias", "pretrained.blocks.18.ls2.gamma", "pretrained.blocks.19.norm1.weight", "pretrained.blocks.19.norm1.bias", "pretrained.blocks.19.attn.qkv.weight", "pretrained.blocks.19.attn.qkv.bias", "pretrained.blocks.19.attn.proj.weight", "pretrained.blocks.19.attn.proj.bias", "pretrained.blocks.19.ls1.gamma", "pretrained.blocks.19.norm2.weight", "pretrained.blocks.19.norm2.bias", "pretrained.blocks.19.mlp.fc1.weight", "pretrained.blocks.19.mlp.fc1.bias", "pretrained.blocks.19.mlp.fc2.weight", "pretrained.blocks.19.mlp.fc2.bias", "pretrained.blocks.19.ls2.gamma", "pretrained.blocks.20.norm1.weight", "pretrained.blocks.20.norm1.bias", "pretrained.blocks.20.attn.qkv.weight", "pretrained.blocks.20.attn.qkv.bias", "pretrained.blocks.20.attn.proj.weight", "pretrained.blocks.20.attn.proj.bias", "pretrained.blocks.20.ls1.gamma", "pretrained.blocks.20.norm2.weight", "pretrained.blocks.20.norm2.bias", "pretrained.blocks.20.mlp.fc1.weight", "pretrained.blocks.20.mlp.fc1.bias", "pretrained.blocks.20.mlp.fc2.weight", "pretrained.blocks.20.mlp.fc2.bias", "pretrained.blocks.20.ls2.gamma", "pretrained.blocks.21.norm1.weight", "pretrained.blocks.21.norm1.bias", "pretrained.blocks.21.attn.qkv.weight", "pretrained.blocks.21.attn.qkv.bias", "pretrained.blocks.21.attn.proj.weight", "pretrained.blocks.21.attn.proj.bias", "pretrained.blocks.21.ls1.gamma", "pretrained.blocks.21.norm2.weight", "pretrained.blocks.21.norm2.bias", "pretrained.blocks.21.mlp.fc1.weight", "pretrained.blocks.21.mlp.fc1.bias", "pretrained.blocks.21.mlp.fc2.weight", "pretrained.blocks.21.mlp.fc2.bias", "pretrained.blocks.21.ls2.gamma", "pretrained.blocks.22.norm1.weight", "pretrained.blocks.22.norm1.bias", "pretrained.blocks.22.attn.qkv.weight", "pretrained.blocks.22.attn.qkv.bias", "pretrained.blocks.22.attn.proj.weight", "pretrained.blocks.22.attn.proj.bias", "pretrained.blocks.22.ls1.gamma", "pretrained.blocks.22.norm2.weight", "pretrained.blocks.22.norm2.bias", "pretrained.blocks.22.mlp.fc1.weight", "pretrained.blocks.22.mlp.fc1.bias", "pretrained.blocks.22.mlp.fc2.weight", "pretrained.blocks.22.mlp.fc2.bias", "pretrained.blocks.22.ls2.gamma", "pretrained.blocks.23.norm1.weight", "pretrained.blocks.23.norm1.bias", "pretrained.blocks.23.attn.qkv.weight", "pretrained.blocks.23.attn.qkv.bias", "pretrained.blocks.23.attn.proj.weight", "pretrained.blocks.23.attn.proj.bias", "pretrained.blocks.23.ls1.gamma", "pretrained.blocks.23.norm2.weight", "pretrained.blocks.23.norm2.bias", "pretrained.blocks.23.mlp.fc1.weight", "pretrained.blocks.23.mlp.fc1.bias", "pretrained.blocks.23.mlp.fc2.weight", "pretrained.blocks.23.mlp.fc2.bias", "pretrained.blocks.23.ls2.gamma", "pretrained.norm.weight", "pretrained.norm.bias", "depth_head.projects.0.weight", "depth_head.projects.0.bias", "depth_head.projects.1.weight", "depth_head.projects.1.bias", "depth_head.projects.2.weight", "depth_head.projects.2.bias", "depth_head.projects.3.weight", "depth_head.projects.3.bias", "depth_head.resize_layers.0.weight", "depth_head.resize_layers.0.bias", "depth_head.resize_layers.1.weight", "depth_head.resize_layers.1.bias", "depth_head.resize_layers.3.weight", "depth_head.resize_layers.3.bias", "depth_head.scratch.layer1_rn.weight", "depth_head.scratch.layer2_rn.weight", "depth_head.scratch.layer3_rn.weight", "depth_head.scratch.layer4_rn.weight", "depth_head.scratch.refinenet1.out_conv.weight", "depth_head.scratch.refinenet1.out_conv.bias", "depth_head.scratch.refinenet1.resConfUnit1.conv1.weight", "depth_head.scratch.refinenet1.resConfUnit1.conv1.bias", "depth_head.scratch.refinenet1.resConfUnit1.conv2.weight", "depth_head.scratch.refinenet1.resConfUnit1.conv2.bias", "depth_head.scratch.refinenet1.resConfUnit2.conv1.weight", "depth_head.scratch.refinenet1.resConfUnit2.conv1.bias", "depth_head.scratch.refinenet1.resConfUnit2.conv2.weight", "depth_head.scratch.refinenet1.resConfUnit2.conv2.bias", "depth_head.scratch.refinenet2.out_conv.weight", "depth_head.scratch.refinenet2.out_conv.bias", "depth_head.scratch.refinenet2.resConfUnit1.conv1.weight", "depth_head.scratch.refinenet2.resConfUnit1.conv1.bias", "depth_head.scratch.refinenet2.resConfUnit1.conv2.weight", "depth_head.scratch.refinenet2.resConfUnit1.conv2.bias", "depth_head.scratch.refinenet2.resConfUnit2.conv1.weight", "depth_head.scratch.refinenet2.resConfUnit2.conv1.bias", "depth_head.scratch.refinenet2.resConfUnit2.conv2.weight", "depth_head.scratch.refinenet2.resConfUnit2.conv2.bias", "depth_head.scratch.refinenet3.out_conv.weight", "depth_head.scratch.refinenet3.out_conv.bias", "depth_head.scratch.refinenet3.resConfUnit1.conv1.weight", "depth_head.scratch.refinenet3.resConfUnit1.conv1.bias", "depth_head.scratch.refinenet3.resConfUnit1.conv2.weight", "depth_head.scratch.refinenet3.resConfUnit1.conv2.bias", "depth_head.scratch.refinenet3.resConfUnit2.conv1.weight", "depth_head.scratch.refinenet3.resConfUnit2.conv1.bias", "depth_head.scratch.refinenet3.resConfUnit2.conv2.weight", "depth_head.scratch.refinenet3.resConfUnit2.conv2.bias", "depth_head.scratch.refinenet4.out_conv.weight", "depth_head.scratch.refinenet4.out_conv.bias", "depth_head.scratch.refinenet4.resConfUnit1.conv1.weight", "depth_head.scratch.refinenet4.resConfUnit1.conv1.bias", "depth_head.scratch.refinenet4.resConfUnit1.conv2.weight", "depth_head.scratch.refinenet4.resConfUnit1.conv2.bias", "depth_head.scratch.refinenet4.resConfUnit2.conv1.weight", "depth_head.scratch.refinenet4.resConfUnit2.conv1.bias", "depth_head.scratch.refinenet4.resConfUnit2.conv2.weight", "depth_head.scratch.refinenet4.resConfUnit2.conv2.bias", "depth_head.scratch.output_conv1.weight", "depth_head.scratch.output_conv1.bias", "depth_head.scratch.output_conv2.0.weight", "depth_head.scratch.output_conv2.0.bias", "depth_head.scratch.output_conv2.2.weight", "depth_head.scratch.output_conv2.2.bias". Unexpected key(s) in state_dict: "model", "optimizer", "epoch". Please help me, thank you very much.

jannehlamin commented 5 months ago

If the training was successful then you should should change directory to metric_depth run the evaluate.py as follows: python evaluate.py -m zoedepth --pretrained_resource="local::/path/to/local/ckpt.pt" -d nyu where :

  1. -m is your custom model
  2. --pretrained_resource is the location of the pre-trained model from depth-anything

--pretrained_resource="local:./checkpoints/depth_anything_metric_depth_indoor.pt' a). create a folder in metric_depth > checkpoints and put the following pre-trained models https://huggingface.co/spaces/LiheYoung/Depth-Anything/tree/main/checkpoints_metric_depth https://huggingface.co/spaces/LiheYoung/Depth-Anything/blob/main/checkpoints/depth_anything_vitl14.pth

  1. -d is your test dataset

I hope this small effort helps!

swing2331 commented 4 months ago

@HaosenZ,Hello, how do you implement training on custom datasets? I hope to receive guidance on configuration modifications. Thank you.

X-zy-0816 commented 4 months ago

@swing2331 Hi, I am also working on the same problem these days, wondering if you get any progress that I can learned of or maybe we can talk about it and work together?

HaosenZ commented 4 months ago

,您好,如何在自定义数据集上实现训练?我希望收到有关配置修改的指导。谢谢。

186/10000 实时翻译 186/10000我已经根据kitti\nyu数据集的格式将我的数据集划分为多个文件夹,并对kitti_eigen_test_file_with_gt.txt和kitti_eogen_rain_file_with_gt.txt文件进行了相应的修改。此外,我还修改了metric depth文件夹中train_mono.py文件中的相关代码(以及其他文件下的一些修改)。最后,使用readme中的training命令行。

划译 I have divided my dataset into folders according to the format of the kitti \ nyu dataset, and made corresponding modifications to the kitti_eigen_test_file_with_gt.txt and kitti_eigen_train_file_with_gt.txt files. Additionally, I have modified the relevant code in the train_mono.py file located in the metric_depth folder (as well as some modifications under other files) . Finally, use the training command line in readme.

X-zy-0816 commented 4 months ago

@HaosenZ Thanks for your time, the operation method you gave is similar to what I currently imagine, but I still have a small question to ask. In the _kitti_eigen_test_file_withgt.txt file, the first and second columns are images and their corresponding depth truth labels, so what is the third row?

Thank you very much for your answer.

swing2331 commented 4 months ago

@HaosenZ 谢谢你的时间,你给的一些方法和目前想象的差不多,不过我还有一个小问题想问一下,在_kitti_eigen_test_file_with_gt.txt_文件中,第一列共列是图像以及它们对应的深度真值标签,那么第三行是什么呢?

非常感谢您的回答。

I think the third column is the camera focal length f