Open ZiyangYan opened 8 months ago
DepthAnything.from_pretrained
with local_files_only=True
tries to find a checkpoint on your local cache, previously downloaded from the web.
To load a local checkpoint, you can use the following code:
from depth_anything.dpt import DPT_DINOv2
depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384])
ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth')
depth_anything.load_state_dict(ckpt)
this method works if you know hyper-params for the encoder that you want to load (features
and out_channels
).
To find the ones for the ViT-s encoder, I compared the model declaration in depth_anything.dpt
with the model state_dict
.
Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.
DepthAnything.from_pretrained
withlocal_files_only=True
tries to find a checkpoint on your local cache, previously downloaded from the web.To load a local checkpoint, you can use the following code:
from depth_anything.dpt import DPT_DINOv2 depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384]) ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth') depth_anything.load_state_dict(ckpt)
this method works if you know hyper-params for the encoder that you want to load (
features
andout_channels
). To find the ones for the ViT-s encoder, I compared the model declaration indepth_anything.dpt
with the modelstate_dict
. Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.
Hi, thanks you for your reply, I changed my code follow your guidance, but it raised another error ![Uploading 1707490282887.png…]()
DepthAnything.from_pretrained
withlocal_files_only=True
tries to find a checkpoint on your local cache, previously downloaded from the web.To load a local checkpoint, you can use the following code:
from depth_anything.dpt import DPT_DINOv2 depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384]) ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth') depth_anything.load_state_dict(ckpt)
this method works if you know hyper-params for the encoder that you want to load (
features
andout_channels
). To find the ones for the ViT-s encoder, I compared the model declaration indepth_anything.dpt
with the modelstate_dict
. Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.
Hi @ZiyangYan, did you move your model on GPU with depth_anything.cuda()
after initializing it?
The error is reporting that your input is a CUDA tensor, while model weights are not
Hi @ZiyangYan, did you move your model on GPU with
depth_anything.cuda()
after initializing it? The error is reporting that your input is a CUDA tensor, while model weights are not
Thanks, I can run it now, but do you know how to set metric model as pre-train model and run it?
Hi @ZiyangYan, to load our pre-trained models without Internet connection, you can suggested to follow the instructions here. You need to download both the config file and checkpoint file. Then put them under the same directory, and specify this directory in DepthAnything.from_pretrained('the-directory-storing-config-and-ckp', local_files_only=True)
.
Hi @ZiyangYan, to load our pre-trained models without Internet connection, you can suggested to follow the instructions here. You need to download both the config file and checkpoint file. Then put them under the same directory, and specify this directory in
DepthAnything.from_pretrained('the-directory-storing-config-and-ckp', local_files_only=True)
.
Yes, I saw it and changed the code followed the instructions and then this problem occur, because the function search for the pre-train model in .cache directioy rather than the path set by me
Hi @ZiyangYan, sorry for the late reply. If you have not solved this problem, please refer to our latest instructions for loading the models without Internet connection.
@ZiyangYan Hello, how did you download the metric depth pre-training weight file? I've been having trouble connecting to huggingface.co. Could you send me a copy if it's convenient?
Hi @ZiyangYan, sorry for the late reply. If you have not solved this problem, please refer to our latest instructions for loading the models without Internet connection.
Hello! Today I follow your latest instructions to modify some codes, however, there are some errors below, I make sure my conda and pytorch-gpu enviroment are normal and I try my best to search some answers on the website, unfortunately I still not solve the problem. I hope you can provide some advice, thanks a lot.
Errors:
xFormers not available
xFormers not available
Total parameters: 24.79M
0%| | 0/19 [00:00<?, ?it/s]
Traceback (most recent call last):
File "run.py", line 87, in <module>
depth = depth_anything(image)
File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kevin/deep_learning_collection/Depth-Anything/depth_anything/dpt.py", line 162, in forward
features = self.pretrained.get_intermediate_layers(x, 4, return_class_token=True)
File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 308, in get_intermediate_layers
outputs = self._get_intermediate_layers_not_chunked(x, n)
File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 272, in _get_intermediate_layers_not_chunked
x = self.prepare_tokens_with_masks(x)
File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 214, in prepare_tokens_with_masks
x = self.patch_embed(x)
File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py", line 76, in forward
x = self.proj(x) # B C H W
File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
My codes:
import argparse
import cv2
import numpy as np
import os
import torch
import torch.nn.functional as F
from torchvision.transforms import Compose
from tqdm import tqdm
from depth_anything.dpt import DepthAnything
from depth_anything.dpt import DPT_DINOv2
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--img-path', type=str)
parser.add_argument('--outdir', type=str, default='./vis_depth')
parser.add_argument('--encoder', type=str, default='vits', choices=['vits', 'vitb', 'vitl'])
parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')
args = parser.parse_args()
margin_width = 50
caption_height = 60
font = cv2.FONT_HERSHEY_SIMPLEX
font_scale = 1
font_thickness = 2
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
# depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{}14'.format(args.encoder)).to(
# DEVICE).eval()
model_configs = {
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}
}
encoder = 'vits' # or 'vitb', 'vits'
depth_anything = DepthAnything(model_configs[encoder])
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))
total_params = sum(param.numel() for param in depth_anything.parameters())
print('Total parameters: {:.2f}M'.format(total_params / 1e6))
transform = Compose([
Resize(
width=518,
height=518,
resize_target=False,
keep_aspect_ratio=True,
ensure_multiple_of=14,
resize_method='lower_bound',
image_interpolation_method=cv2.INTER_CUBIC,
),
NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
PrepareForNet(),
])
if os.path.isfile(args.img_path):
if args.img_path.endswith('txt'):
with open(args.img_path, 'r') as f:
filenames = f.read().splitlines()
else:
filenames = [args.img_path]
else:
filenames = os.listdir(args.img_path)
filenames = [os.path.join(args.img_path, filename) for filename in filenames if not filename.startswith('.')]
filenames.sort()
os.makedirs(args.outdir, exist_ok=True)
for filename in tqdm(filenames):
raw_image = cv2.imread(filename)
image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0
h, w = image.shape[:2]
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0).to(DEVICE)
with torch.no_grad():
depth = depth_anything(image)
depth = F.interpolate(depth[None], (h, w), mode='bilinear', align_corners=False)[0, 0]
depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0
depth = depth.cpu().numpy().astype(np.uint8)
if args.grayscale:
depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
else:
depth = cv2.applyColorMap(depth, cv2.COLORMAP_INFERNO)
filename = os.path.basename(filename)
if args.pred_only:
cv2.imwrite(os.path.join(args.outdir, filename[:filename.rfind('.')] + '_depth.png'), depth)
else:
split_region = np.ones((raw_image.shape[0], margin_width, 3), dtype=np.uint8) * 255
combined_results = cv2.hconcat([raw_image, split_region, depth])
caption_space = np.ones((caption_height, combined_results.shape[1], 3), dtype=np.uint8) * 255
captions = ['Raw image', 'Depth Anything']
segment_width = w + margin_width
for i, caption in enumerate(captions):
# Calculate text size
text_size = cv2.getTextSize(caption, font, font_scale, font_thickness)[0]
# Calculate x-coordinate to center the text
text_x = int((segment_width * i) + (w - text_size[0]) / 2)
# Add text caption
cv2.putText(caption_space, caption, (text_x, 40), font, font_scale, (0, 0, 0), font_thickness)
final_result = cv2.vconcat([caption_space, combined_results])
cv2.imwrite(os.path.join(args.outdir, filename[:filename.rfind('.')] + '_img_depth.png'), final_result)
My enviroment below:
Package Version Editable project location
----------------------------- -------------------- ------------------------------------------
absl-py 1.4.0
actionlib 1.14.0
aiofiles 23.2.1
altair 5.2.0
altgraph 0.17.4
angles 1.9.13
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.2.0
apex 0.1
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
asttokens 2.0.5
astunparse 1.6.3
async-lru 2.0.4
attrs 23.1.0
autopep8 2.0.2
Babel 2.11.0
backcall 0.2.0
beautifulsoup4 4.12.2
bidict 0.22.1
bleach 4.1.0
bondpy 1.8.6
cachetools 5.3.1
camera-calibration 1.17.0
camera-calibration-parsers 1.12.0
catkin 0.8.10
certifi 2023.7.22
cffi 1.16.0
chardet 5.2.0
charset-normalizer 2.0.4
chromedriver-autoinstaller 0.4.0
click 8.1.7
cmake 3.27.0
colorama 0.4.6
coloredlogs 15.0.1
comm 0.2.1
contourpy 1.1.0
controller-manager 0.20.0
controller-manager-msgs 0.20.0
controlnet-aux 0.0.3
cv-bridge 1.16.2
cycler 0.11.0
Cython 3.0.2
debugpy 1.6.7
decorator 5.1.1
defusedxml 0.7.1
diagnostic-analysis 1.11.0
diagnostic-common-diagnostics 1.11.0
diagnostic-updater 1.11.0
diffusers 0.16.1
dynamic-reconfigure 1.7.3
einops 0.7.0
et-xmlfile 1.1.0
exceptiongroup 1.1.2
executing 0.8.3
expecttest 0.1.4
fastapi 0.109.0
fastjsonschema 2.16.2
ffmpy 0.3.1
filelock 3.12.2
Flask 2.2.3
Flask-Cors 4.0.0
Flask-SocketIO 5.3.6
flaskwebgui 0.3.5
flatbuffers 23.5.26
fonttools 4.41.1
fsspec 2023.6.0
gazebo_plugins 2.9.2
gazebo_ros 2.9.2
gencpp 0.7.0
geneus 3.0.0
genlisp 0.4.18
genmsg 0.6.0
gennodejs 2.0.2
genpy 0.6.15
gitdb 4.0.10
GitPython 3.1.32
google-auth 2.22.0
google-auth-oauthlib 1.0.0
gradio 4.14.0
gradio_client 0.8.0
gradio_imageslider 0.0.18
grpcio 1.56.2
h11 0.14.0
httpcore 1.0.2
httpx 0.26.0
huggingface-hub 0.20.3
humanfriendly 10.0
hypothesis 6.82.0
idna 3.4
image-geometry 1.16.2
imageio 2.33.1
importlib-metadata 6.8.0
importlib-resources 6.0.0
interactive-markers 1.12.0
ipykernel 6.28.0
ipython 8.12.2
ipywidgets 8.1.2
itsdangerous 2.1.2
jedi 0.18.1
Jinja2 3.1.2
joblib 1.3.1
joint-state-publisher 1.15.1
joint-state-publisher-gui 1.15.1
json5 0.9.6
jsonschema 4.19.2
jsonschema-specifications 2023.12.1
jupyter 1.0.0
jupyter_client 8.6.0
jupyter-console 6.6.3
jupyter_core 5.5.0
jupyter-events 0.8.0
jupyter-lsp 2.2.0
jupyter_server 2.10.0
jupyter_server_terminals 0.4.4
jupyterlab 4.0.11
jupyterlab-pygments 0.1.2
jupyterlab_server 2.25.1
jupyterlab-widgets 3.0.10
kiwisolver 1.4.4
lama-cleaner 1.2.5
laser_geometry 1.6.7
lazy_loader 0.3
lit 16.0.6
loguru 0.7.1
lxml 4.9.3
Markdown 3.4.4
markdown-it-py 3.0.0
MarkupSafe 2.1.3
matplotlib 3.7.2
matplotlib-inline 0.1.6
mdurl 0.1.2
message-filters 1.16.0
mistune 2.0.4
mpmath 1.3.0
nbclient 0.8.0
nbconvert 7.10.0
nbformat 5.9.2
nest-asyncio 1.6.0
networkx 3.1
ninja 1.11.1
notebook 7.0.8
notebook_shim 0.2.3
numpy 1.23.1
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
oauthlib 3.2.2
omegaconf 2.3.0
onnx 1.14.0
onnx-simplifier 0.4.10
onnxruntime 1.15.1
opencv-contrib-python 4.8.0.74
opencv-python 4.8.0.74
openpyxl 3.1.2
orjson 3.9.12
outcome 1.3.0.post0
overrides 7.4.0
packaging 23.1
pandas 2.0.3
pandocfilters 1.5.0
parso 0.8.3
pexpect 4.8.0
pickleshare 0.7.5
piexif 1.1.3
Pillow 10.0.0
pip 24.0
pkgutil_resolve_name 1.3.10
platformdirs 3.10.0
ply 3.11
progress 1.6
prometheus-client 0.14.1
prompt-toolkit 3.0.43
protobuf 4.24.0
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py-cpuinfo 9.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pycocotools 2.0.7
pycodestyle 2.10.0
pycparser 2.21
pydantic 2.5.3
pydantic_core 2.14.6
pydub 0.25.1
Pygments 2.15.1
pyinstaller 5.9.0
pyinstaller-hooks-contrib 2024.1
pyparsing 3.0.9
PyQt5 5.15.10
PyQt5-Qt5 5.15.2
PyQt5-sip 12.13.0
PySocks 1.7.1
python-dateutil 2.8.2
python-engineio 4.8.2
python-json-logger 2.0.7
python-multipart 0.0.6
python-qt-binding 0.4.4
python-resize-image 1.1.20
python-socketio 5.11.0
pytils 0.4.1
pytz 2023.3
PyWavelets 1.4.1
PyYAML 6.0.1
pyzmq 25.1.2
qt-dotgraph 0.4.2
qt-gui 0.4.2
qt-gui-cpp 0.4.2
qt-gui-py-common 0.4.2
qtconsole 5.5.1
QtPy 2.4.1
referencing 0.33.0
regex 2023.12.25
requests 2.31.0
requests-oauthlib 1.3.1
resource_retriever 1.12.7
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rich 13.5.2
rosbag 1.16.0
rosboost-cfg 1.15.8
rosclean 1.15.8
roscreate 1.15.8
rosgraph 1.16.0
roslaunch 1.16.0
roslib 1.15.8
roslint 0.12.0
roslz4 1.16.0
rosmake 1.15.8
rosmaster 1.16.0
rosmsg 1.16.0
rosnode 1.16.0
rosparam 1.16.0
rospy 1.16.0
rosservice 1.16.0
rostest 1.16.0
rostopic 1.16.0
rosunit 1.15.8
roswtf 1.16.0
rpds-py 0.10.6
rqt_action 0.4.9
rqt_bag 0.5.1
rqt_bag_plugins 0.5.1
rqt-console 0.4.12
rqt_dep 0.4.12
rqt_graph 0.4.14
rqt_gui 0.5.3
rqt_gui_py 0.5.3
rqt-image-view 0.4.17
rqt_launch 0.4.9
rqt-logger-level 0.4.12
rqt-moveit 0.5.11
rqt_msg 0.4.10
rqt_nav_view 0.5.7
rqt_plot 0.4.13
rqt_pose_view 0.5.11
rqt_publisher 0.4.10
rqt_py_common 0.5.3
rqt_py_console 0.4.10
rqt-reconfigure 0.5.5
rqt-robot-dashboard 0.5.8
rqt-robot-monitor 0.5.15
rqt_robot_steering 0.5.12
rqt-runtime-monitor 0.5.10
rqt-rviz 0.7.0
rqt_service_caller 0.4.10
rqt_shell 0.4.11
rqt_srv 0.4.9
rqt-tf-tree 0.6.4
rqt_top 0.4.10
rqt_topic 0.4.13
rqt_web 0.4.10
rsa 4.9
ruff 0.1.14
rviz 1.14.20
safetensors 0.4.2
scikit-image 0.21.0
scikit-learn 1.3.0
scipy 1.10.1
seaborn 0.12.2
selenium 4.8.3
semantic-version 2.10.0
Send2Trash 1.8.2
sensor-msgs 1.13.1
setuptools 67.8.0
shellingham 1.5.4
simple-websocket 1.0.0
sip 6.7.12
six 1.16.0
smach 2.5.2
smach-ros 2.5.2
smclib 1.8.6
smmap 5.0.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.5
stack-data 0.2.0
starlette 0.35.1
sympy 1.12
tabulate 0.9.0
tensorboard 2.14.0
tensorboard-data-server 0.7.1
terminado 0.17.1
tf 1.13.2
tf-conversions 1.13.2
tf2-geometry-msgs 0.7.7
tf2-kdl 0.7.7
tf2-py 0.7.7
tf2-ros 0.7.7
thop 0.1.1.post2209072238
threadpoolctl 3.2.0
tifffile 2023.7.10
timm 0.9.12
tinycss2 1.2.1
tokenizers 0.13.3
tomli 2.0.1
tomlkit 0.12.0
tools 0.1.9
toolz 0.12.1
topic-tools 1.16.0
torch 2.0.1
torchaudio 2.0.2
torchsummary 1.5.1
torchvision 0.15.2
tornado 6.3.3
tqdm 4.65.0
traitlets 5.7.1
transformers 4.27.4
trio 0.24.0
trio-websocket 0.11.1
triton 2.0.0
typer 0.9.0
types-dataclasses 0.6.6
typing_extensions 4.9.0
tzdata 2023.3
ultralytics 8.0.143
urllib3 1.26.16
utils 1.0.1
uvicorn 0.27.0.post1
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
websockets 11.0.3
Werkzeug 2.2.2
wheel 0.38.4
whichcraft 0.6.1
widgetsnbextension 4.0.10
wsproto 1.2.0
xacro 1.14.17
xmltodict 0.13.0
yacs 0.1.8
yolox 0.3.0 /home/kevin/deep_learning_collection/YOLOX
zipp 3.16.2
The error: "Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same"
Is just saying that the input (the image) is on the gpu (cuda) whereas the model weights are not. To fix it, you'll just need to move the model to the gpu, which you can do by adding an extra line after you load the model:
depth_anything = DepthAnything(model_configs[encoder])
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))
depth_anything.to(DEVICE) # <---- This line moves the model to the gpu
The error: "Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same"
Is just saying that the input (the image) is on the gpu (cuda) whereas the model weights are not. To fix it, you'll just need to move the model to the gpu, which you can do by adding an extra line after you load the model:
depth_anything = DepthAnything(model_configs[encoder]) depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth')) depth_anything.to(DEVICE) # <---- This line moves the model to the gpu
Thank you so much! I run them successfully!
I change the code as you shown in webpage, but it raised strange error, how can I fix it?
And if I want to use metric_depth as pre-train model, can I also use the "from_pretrained" function to load it?