Can't load pre-train model from local disk

ZiyangYan commented 8 months ago

I change the code as you shown in webpage, but it raised strange error, how can I fix it?

And if I want to use metric_depth as pre-train model, can I also use the "from_pretrained" function to load it?

pestrstr commented 8 months ago

DepthAnything.from_pretrained with local_files_only=True tries to find a checkpoint on your local cache, previously downloaded from the web.

To load a local checkpoint, you can use the following code:

from depth_anything.dpt import DPT_DINOv2
depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384])
ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth')  
depth_anything.load_state_dict(ckpt)

this method works if you know hyper-params for the encoder that you want to load (features and out_channels). To find the ones for the ViT-s encoder, I compared the model declaration in depth_anything.dpt with the model state_dict. Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.

ZiyangYan commented 8 months ago

DepthAnything.from_pretrained with local_files_only=True tries to find a checkpoint on your local cache, previously downloaded from the web.

To load a local checkpoint, you can use the following code:
from depth_anything.dpt import DPT_DINOv2
depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384])
ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth')  
depth_anything.load_state_dict(ckpt)
this method works if you know hyper-params for the encoder that you want to load (features and out_channels). To find the ones for the ViT-s encoder, I compared the model declaration in depth_anything.dpt with the model state_dict. Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.

Hi, thanks you for your reply, I changed my code follow your guidance, but it raised another error ![Uploading 1707490282887.png…]()

ZiyangYan commented 8 months ago

DepthAnything.from_pretrained with local_files_only=True tries to find a checkpoint on your local cache, previously downloaded from the web.

To load a local checkpoint, you can use the following code:
from depth_anything.dpt import DPT_DINOv2
depth_anything = DPT_DINOv2('vits', features=64, out_channels=[48, 96, 192, 384])
ckpt = torch.load('models/checkpoints/depth_anything_vits14.pth')  
depth_anything.load_state_dict(ckpt)
this method works if you know hyper-params for the encoder that you want to load (features and out_channels). To find the ones for the ViT-s encoder, I compared the model declaration in depth_anything.dpt with the model state_dict. Maybe original authors can suggest us a better way to do it without explicitly specifying hyper-params.

pestrstr commented 8 months ago

Hi @ZiyangYan, did you move your model on GPU with depth_anything.cuda() after initializing it? The error is reporting that your input is a CUDA tensor, while model weights are not

ZiyangYan commented 8 months ago

Hi @ZiyangYan, did you move your model on GPU with depth_anything.cuda() after initializing it? The error is reporting that your input is a CUDA tensor, while model weights are not

Thanks, I can run it now, but do you know how to set metric model as pre-train model and run it?

LiheYoung commented 7 months ago

Hi @ZiyangYan, to load our pre-trained models without Internet connection, you can suggested to follow the instructions here. You need to download both the config file and checkpoint file. Then put them under the same directory, and specify this directory in DepthAnything.from_pretrained('the-directory-storing-config-and-ckp', local_files_only=True).

ZiyangYan commented 7 months ago

Hi @ZiyangYan, to load our pre-trained models without Internet connection, you can suggested to follow the instructions here. You need to download both the config file and checkpoint file. Then put them under the same directory, and specify this directory in DepthAnything.from_pretrained('the-directory-storing-config-and-ckp', local_files_only=True).

Yes, I saw it and changed the code followed the instructions and then this problem occur, because the function search for the pre-train model in .cache directioy rather than the path set by me

LiheYoung commented 6 months ago

Hi @ZiyangYan, sorry for the late reply. If you have not solved this problem, please refer to our latest instructions for loading the models without Internet connection.

frothmoon commented 6 months ago

@ZiyangYan Hello, how did you download the metric depth pre-training weight file? I've been having trouble connecting to huggingface.co. Could you send me a copy if it's convenient?

STRIVESS commented 4 months ago

Hi @ZiyangYan, sorry for the late reply. If you have not solved this problem, please refer to our latest instructions for loading the models without Internet connection.

Hello! Today I follow your latest instructions to modify some codes, however, there are some errors below, I make sure my conda and pytorch-gpu enviroment are normal and I try my best to search some answers on the website, unfortunately I still not solve the problem. I hope you can provide some advice, thanks a lot.

Errors:

xFormers not available
xFormers not available
Total parameters: 24.79M
  0%|                                                                                                                                           | 0/19 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run.py", line 87, in <module>
    depth = depth_anything(image)
  File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kevin/deep_learning_collection/Depth-Anything/depth_anything/dpt.py", line 162, in forward
    features = self.pretrained.get_intermediate_layers(x, 4, return_class_token=True)
  File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 308, in get_intermediate_layers
    outputs = self._get_intermediate_layers_not_chunked(x, n)
  File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 272, in _get_intermediate_layers_not_chunked
    x = self.prepare_tokens_with_masks(x)
  File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/vision_transformer.py", line 214, in prepare_tokens_with_masks
    x = self.patch_embed(x)
  File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kevin/deep_learning_collection/Depth-Anything/torchhub/facebookresearch_dinov2_main/dinov2/layers/patch_embed.py", line 76, in forward
    x = self.proj(x)  # B C H W
  File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/kevin/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

My codes:

import argparse
import cv2
import numpy as np
import os
import torch
import torch.nn.functional as F
from torchvision.transforms import Compose
from tqdm import tqdm

from depth_anything.dpt import DepthAnything
from depth_anything.dpt import DPT_DINOv2
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--img-path', type=str)
    parser.add_argument('--outdir', type=str, default='./vis_depth')
    parser.add_argument('--encoder', type=str, default='vits', choices=['vits', 'vitb', 'vitl'])

    parser.add_argument('--pred-only', dest='pred_only', action='store_true', help='only display the prediction')
    parser.add_argument('--grayscale', dest='grayscale', action='store_true', help='do not apply colorful palette')

    args = parser.parse_args()

    margin_width = 50
    caption_height = 60

    font = cv2.FONT_HERSHEY_SIMPLEX
    font_scale = 1
    font_thickness = 2

    DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
    # depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{}14'.format(args.encoder)).to(
    #     DEVICE).eval()

    model_configs = {
        'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
        'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
        'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}
    }
    encoder = 'vits'  # or 'vitb', 'vits'
    depth_anything = DepthAnything(model_configs[encoder])
    depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))

total_params = sum(param.numel() for param in depth_anything.parameters())
print('Total parameters: {:.2f}M'.format(total_params / 1e6))

transform = Compose([
    Resize(
        width=518,
        height=518,
        resize_target=False,
        keep_aspect_ratio=True,
        ensure_multiple_of=14,
        resize_method='lower_bound',
        image_interpolation_method=cv2.INTER_CUBIC,
    ),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet(),
])

if os.path.isfile(args.img_path):
    if args.img_path.endswith('txt'):
        with open(args.img_path, 'r') as f:
            filenames = f.read().splitlines()
    else:
        filenames = [args.img_path]
else:
    filenames = os.listdir(args.img_path)
    filenames = [os.path.join(args.img_path, filename) for filename in filenames if not filename.startswith('.')]
    filenames.sort()

os.makedirs(args.outdir, exist_ok=True)

for filename in tqdm(filenames):
    raw_image = cv2.imread(filename)
    image = cv2.cvtColor(raw_image, cv2.COLOR_BGR2RGB) / 255.0

    h, w = image.shape[:2]

    image = transform({'image': image})['image']
    image = torch.from_numpy(image).unsqueeze(0).to(DEVICE)

    with torch.no_grad():
        depth = depth_anything(image)

    depth = F.interpolate(depth[None], (h, w), mode='bilinear', align_corners=False)[0, 0]
    depth = (depth - depth.min()) / (depth.max() - depth.min()) * 255.0

    depth = depth.cpu().numpy().astype(np.uint8)

    if args.grayscale:
        depth = np.repeat(depth[..., np.newaxis], 3, axis=-1)
    else:
        depth = cv2.applyColorMap(depth, cv2.COLORMAP_INFERNO)

    filename = os.path.basename(filename)

    if args.pred_only:
        cv2.imwrite(os.path.join(args.outdir, filename[:filename.rfind('.')] + '_depth.png'), depth)
    else:
        split_region = np.ones((raw_image.shape[0], margin_width, 3), dtype=np.uint8) * 255
        combined_results = cv2.hconcat([raw_image, split_region, depth])

        caption_space = np.ones((caption_height, combined_results.shape[1], 3), dtype=np.uint8) * 255
        captions = ['Raw image', 'Depth Anything']
        segment_width = w + margin_width

        for i, caption in enumerate(captions):
            # Calculate text size
            text_size = cv2.getTextSize(caption, font, font_scale, font_thickness)[0]

            # Calculate x-coordinate to center the text
            text_x = int((segment_width * i) + (w - text_size[0]) / 2)

            # Add text caption
            cv2.putText(caption_space, caption, (text_x, 40), font, font_scale, (0, 0, 0), font_thickness)

        final_result = cv2.vconcat([caption_space, combined_results])

        cv2.imwrite(os.path.join(args.outdir, filename[:filename.rfind('.')] + '_img_depth.png'), final_result)

My enviroment below:

Package                       Version              Editable project location
----------------------------- -------------------- ------------------------------------------
absl-py                       1.4.0
actionlib                     1.14.0
aiofiles                      23.2.1
altair                        5.2.0
altgraph                      0.17.4
angles                        1.9.13
annotated-types               0.6.0
antlr4-python3-runtime        4.9.3
anyio                         4.2.0
apex                          0.1
argon2-cffi                   21.3.0
argon2-cffi-bindings          21.2.0
asttokens                     2.0.5
astunparse                    1.6.3
async-lru                     2.0.4
attrs                         23.1.0
autopep8                      2.0.2
Babel                         2.11.0
backcall                      0.2.0
beautifulsoup4                4.12.2
bidict                        0.22.1
bleach                        4.1.0
bondpy                        1.8.6
cachetools                    5.3.1
camera-calibration            1.17.0
camera-calibration-parsers    1.12.0
catkin                        0.8.10
certifi                       2023.7.22
cffi                          1.16.0
chardet                       5.2.0
charset-normalizer            2.0.4
chromedriver-autoinstaller    0.4.0
click                         8.1.7
cmake                         3.27.0
colorama                      0.4.6
coloredlogs                   15.0.1
comm                          0.2.1
contourpy                     1.1.0
controller-manager            0.20.0
controller-manager-msgs       0.20.0
controlnet-aux                0.0.3
cv-bridge                     1.16.2
cycler                        0.11.0
Cython                        3.0.2
debugpy                       1.6.7
decorator                     5.1.1
defusedxml                    0.7.1
diagnostic-analysis           1.11.0
diagnostic-common-diagnostics 1.11.0
diagnostic-updater            1.11.0
diffusers                     0.16.1
dynamic-reconfigure           1.7.3
einops                        0.7.0
et-xmlfile                    1.1.0
exceptiongroup                1.1.2
executing                     0.8.3
expecttest                    0.1.4
fastapi                       0.109.0
fastjsonschema                2.16.2
ffmpy                         0.3.1
filelock                      3.12.2
Flask                         2.2.3
Flask-Cors                    4.0.0
Flask-SocketIO                5.3.6
flaskwebgui                   0.3.5
flatbuffers                   23.5.26
fonttools                     4.41.1
fsspec                        2023.6.0
gazebo_plugins                2.9.2
gazebo_ros                    2.9.2
gencpp                        0.7.0
geneus                        3.0.0
genlisp                       0.4.18
genmsg                        0.6.0
gennodejs                     2.0.2
genpy                         0.6.15
gitdb                         4.0.10
GitPython                     3.1.32
google-auth                   2.22.0
google-auth-oauthlib          1.0.0
gradio                        4.14.0
gradio_client                 0.8.0
gradio_imageslider            0.0.18
grpcio                        1.56.2
h11                           0.14.0
httpcore                      1.0.2
httpx                         0.26.0
huggingface-hub               0.20.3
humanfriendly                 10.0
hypothesis                    6.82.0
idna                          3.4
image-geometry                1.16.2
imageio                       2.33.1
importlib-metadata            6.8.0
importlib-resources           6.0.0
interactive-markers           1.12.0
ipykernel                     6.28.0
ipython                       8.12.2
ipywidgets                    8.1.2
itsdangerous                  2.1.2
jedi                          0.18.1
Jinja2                        3.1.2
joblib                        1.3.1
joint-state-publisher         1.15.1
joint-state-publisher-gui     1.15.1
json5                         0.9.6
jsonschema                    4.19.2
jsonschema-specifications     2023.12.1
jupyter                       1.0.0
jupyter_client                8.6.0
jupyter-console               6.6.3
jupyter_core                  5.5.0
jupyter-events                0.8.0
jupyter-lsp                   2.2.0
jupyter_server                2.10.0
jupyter_server_terminals      0.4.4
jupyterlab                    4.0.11
jupyterlab-pygments           0.1.2
jupyterlab_server             2.25.1
jupyterlab-widgets            3.0.10
kiwisolver                    1.4.4
lama-cleaner                  1.2.5
laser_geometry                1.6.7
lazy_loader                   0.3
lit                           16.0.6
loguru                        0.7.1
lxml                          4.9.3
Markdown                      3.4.4
markdown-it-py                3.0.0
MarkupSafe                    2.1.3
matplotlib                    3.7.2
matplotlib-inline             0.1.6
mdurl                         0.1.2
message-filters               1.16.0
mistune                       2.0.4
mpmath                        1.3.0
nbclient                      0.8.0
nbconvert                     7.10.0
nbformat                      5.9.2
nest-asyncio                  1.6.0
networkx                      3.1
ninja                         1.11.1
notebook                      7.0.8
notebook_shim                 0.2.3
numpy                         1.23.1
nvidia-cublas-cu11            11.10.3.66
nvidia-cuda-cupti-cu11        11.7.101
nvidia-cuda-nvrtc-cu11        11.7.99
nvidia-cuda-runtime-cu11      11.7.99
nvidia-cudnn-cu11             8.5.0.96
nvidia-cufft-cu11             10.9.0.58
nvidia-curand-cu11            10.2.10.91
nvidia-cusolver-cu11          11.4.0.1
nvidia-cusparse-cu11          11.7.4.91
nvidia-nccl-cu11              2.14.3
nvidia-nvtx-cu11              11.7.91
oauthlib                      3.2.2
omegaconf                     2.3.0
onnx                          1.14.0
onnx-simplifier               0.4.10
onnxruntime                   1.15.1
opencv-contrib-python         4.8.0.74
opencv-python                 4.8.0.74
openpyxl                      3.1.2
orjson                        3.9.12
outcome                       1.3.0.post0
overrides                     7.4.0
packaging                     23.1
pandas                        2.0.3
pandocfilters                 1.5.0
parso                         0.8.3
pexpect                       4.8.0
pickleshare                   0.7.5
piexif                        1.1.3
Pillow                        10.0.0
pip                           24.0
pkgutil_resolve_name          1.3.10
platformdirs                  3.10.0
ply                           3.11
progress                      1.6
prometheus-client             0.14.1
prompt-toolkit                3.0.43
protobuf                      4.24.0
psutil                        5.9.0
ptyprocess                    0.7.0
pure-eval                     0.2.2
py-cpuinfo                    9.0.0
pyasn1                        0.5.0
pyasn1-modules                0.3.0
pycocotools                   2.0.7
pycodestyle                   2.10.0
pycparser                     2.21
pydantic                      2.5.3
pydantic_core                 2.14.6
pydub                         0.25.1
Pygments                      2.15.1
pyinstaller                   5.9.0
pyinstaller-hooks-contrib     2024.1
pyparsing                     3.0.9
PyQt5                         5.15.10
PyQt5-Qt5                     5.15.2
PyQt5-sip                     12.13.0
PySocks                       1.7.1
python-dateutil               2.8.2
python-engineio               4.8.2
python-json-logger            2.0.7
python-multipart              0.0.6
python-qt-binding             0.4.4
python-resize-image           1.1.20
python-socketio               5.11.0
pytils                        0.4.1
pytz                          2023.3
PyWavelets                    1.4.1
PyYAML                        6.0.1
pyzmq                         25.1.2
qt-dotgraph                   0.4.2
qt-gui                        0.4.2
qt-gui-cpp                    0.4.2
qt-gui-py-common              0.4.2
qtconsole                     5.5.1
QtPy                          2.4.1
referencing                   0.33.0
regex                         2023.12.25
requests                      2.31.0
requests-oauthlib             1.3.1
resource_retriever            1.12.7
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
rich                          13.5.2
rosbag                        1.16.0
rosboost-cfg                  1.15.8
rosclean                      1.15.8
roscreate                     1.15.8
rosgraph                      1.16.0
roslaunch                     1.16.0
roslib                        1.15.8
roslint                       0.12.0
roslz4                        1.16.0
rosmake                       1.15.8
rosmaster                     1.16.0
rosmsg                        1.16.0
rosnode                       1.16.0
rosparam                      1.16.0
rospy                         1.16.0
rosservice                    1.16.0
rostest                       1.16.0
rostopic                      1.16.0
rosunit                       1.15.8
roswtf                        1.16.0
rpds-py                       0.10.6
rqt_action                    0.4.9
rqt_bag                       0.5.1
rqt_bag_plugins               0.5.1
rqt-console                   0.4.12
rqt_dep                       0.4.12
rqt_graph                     0.4.14
rqt_gui                       0.5.3
rqt_gui_py                    0.5.3
rqt-image-view                0.4.17
rqt_launch                    0.4.9
rqt-logger-level              0.4.12
rqt-moveit                    0.5.11
rqt_msg                       0.4.10
rqt_nav_view                  0.5.7
rqt_plot                      0.4.13
rqt_pose_view                 0.5.11
rqt_publisher                 0.4.10
rqt_py_common                 0.5.3
rqt_py_console                0.4.10
rqt-reconfigure               0.5.5
rqt-robot-dashboard           0.5.8
rqt-robot-monitor             0.5.15
rqt_robot_steering            0.5.12
rqt-runtime-monitor           0.5.10
rqt-rviz                      0.7.0
rqt_service_caller            0.4.10
rqt_shell                     0.4.11
rqt_srv                       0.4.9
rqt-tf-tree                   0.6.4
rqt_top                       0.4.10
rqt_topic                     0.4.13
rqt_web                       0.4.10
rsa                           4.9
ruff                          0.1.14
rviz                          1.14.20
safetensors                   0.4.2
scikit-image                  0.21.0
scikit-learn                  1.3.0
scipy                         1.10.1
seaborn                       0.12.2
selenium                      4.8.3
semantic-version              2.10.0
Send2Trash                    1.8.2
sensor-msgs                   1.13.1
setuptools                    67.8.0
shellingham                   1.5.4
simple-websocket              1.0.0
sip                           6.7.12
six                           1.16.0
smach                         2.5.2
smach-ros                     2.5.2
smclib                        1.8.6
smmap                         5.0.0
sniffio                       1.3.0
sortedcontainers              2.4.0
soupsieve                     2.5
stack-data                    0.2.0
starlette                     0.35.1
sympy                         1.12
tabulate                      0.9.0
tensorboard                   2.14.0
tensorboard-data-server       0.7.1
terminado                     0.17.1
tf                            1.13.2
tf-conversions                1.13.2
tf2-geometry-msgs             0.7.7
tf2-kdl                       0.7.7
tf2-py                        0.7.7
tf2-ros                       0.7.7
thop                          0.1.1.post2209072238
threadpoolctl                 3.2.0
tifffile                      2023.7.10
timm                          0.9.12
tinycss2                      1.2.1
tokenizers                    0.13.3
tomli                         2.0.1
tomlkit                       0.12.0
tools                         0.1.9
toolz                         0.12.1
topic-tools                   1.16.0
torch                         2.0.1
torchaudio                    2.0.2
torchsummary                  1.5.1
torchvision                   0.15.2
tornado                       6.3.3
tqdm                          4.65.0
traitlets                     5.7.1
transformers                  4.27.4
trio                          0.24.0
trio-websocket                0.11.1
triton                        2.0.0
typer                         0.9.0
types-dataclasses             0.6.6
typing_extensions             4.9.0
tzdata                        2023.3
ultralytics                   8.0.143
urllib3                       1.26.16
utils                         1.0.1
uvicorn                       0.27.0.post1
wcwidth                       0.2.5
webencodings                  0.5.1
websocket-client              0.58.0
websockets                    11.0.3
Werkzeug                      2.2.2
wheel                         0.38.4
whichcraft                    0.6.1
widgetsnbextension            4.0.10
wsproto                       1.2.0
xacro                         1.14.17
xmltodict                     0.13.0
yacs                          0.1.8
yolox                         0.3.0                /home/kevin/deep_learning_collection/YOLOX
zipp                          3.16.2

heyoeyo commented 4 months ago

The error: "Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same"

Is just saying that the input (the image) is on the gpu (cuda) whereas the model weights are not. To fix it, you'll just need to move the model to the gpu, which you can do by adding an extra line after you load the model:

depth_anything = DepthAnything(model_configs[encoder])
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))
depth_anything.to(DEVICE)    # <---- This line moves the model to the gpu

STRIVESS commented 4 months ago

The error: "Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same"

Is just saying that the input (the image) is on the gpu (cuda) whereas the model weights are not. To fix it, you'll just need to move the model to the gpu, which you can do by adding an extra line after you load the model:
depth_anything = DepthAnything(model_configs[encoder])
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))
depth_anything.to(DEVICE)    # <---- This line moves the model to the gpu

Thank you so much! I run them successfully! demo4_img_depth

LiheYoung / Depth-Anything

Can't load pre-train model from local disk #87