DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.86k stars 336 forks source link

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same #45

Closed shenw000 closed 4 months ago

shenw000 commented 4 months ago

When I ran the demo code in your "Use Our Models" section:

raw_img = cv2.imread('your/image/path') depth = model.infer_image(raw_img) # HxW raw depth map in numpy

I received the following error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

The complete error message is below. Any suggestions? Your help is greatly appreciated.

RuntimeError Traceback (most recent call last) Cell In[9], line 1 ----> 1 depth = model.infer_image(raw_img)

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dpt.py:190, in DepthAnythingV2.infer_image(self, raw_image, input_size) 186 @torch.no_grad() 187 def infer_image(self, raw_image, input_size=518): 188 image, (h, w) = self.image2tensor(raw_image, input_size) --> 190 depth = self.forward(image) 192 depth = F.interpolate(depth[:, None], (h, w), mode="bilinear", align_corners=True)[0, 0] 194 return depth.cpu().numpy()

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dpt.py:179, in DepthAnythingV2.forward(self, x) 176 def forward(self, x): 177 patch_h, patch_w = x.shape[-2] // 14, x.shape[-1] // 14 --> 179 features = self.pretrained.get_intermediate_layers(x, self.intermediate_layer_idx[self.encoder], return_class_token=True) 181 depth = self.depth_head(features, patch_h, patch_w) 182 depth = F.relu(depth)

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dinov2.py:308, in DinoVisionTransformer.get_intermediate_layers(self, x, n, reshape, return_class_token, norm) 306 outputs = self._get_intermediate_layers_chunked(x, n) 307 else: --> 308 outputs = self._get_intermediate_layers_not_chunked(x, n) 309 if norm: 310 outputs = [self.norm(out) for out in outputs]

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dinov2.py:272, in DinoVisionTransformer._get_intermediate_layers_not_chunked(self, x, n) 271 def _get_intermediate_layers_not_chunked(self, x, n=1): --> 272 x = self.prepare_tokens_with_masks(x) 273 # If n is an int, take the n last blocks. If it's a list, take them 274 output, total_block_len = [], len(self.blocks)

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dinov2.py:214, in DinoVisionTransformer.prepare_tokens_with_masks(self, x, masks) 212 def prepare_tokens_with_masks(self, x, masks=None): 213 B, nc, w, h = x.shape --> 214 x = self.patch_embed(x) 215 if masks is not None: 216 x = torch.where(masks.unsqueeze(-1), self.mask_token.to(x.dtype).unsqueeze(0), x)

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, kwargs) 1530 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1531 else: -> 1532 return self._call_impl(args, kwargs)

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, *kwargs) 1536 # If we don't have any hooks, we want to skip the rest of the logic in 1537 # this function, and just call forward. 1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1539 or _global_backward_pre_hooks or _global_backward_hooks 1540 or _global_forward_hooks or _global_forward_pre_hooks): -> 1541 return forward_call(args, **kwargs) 1543 try: 1544 result = None

File /mnt/sda1/shenw/sandbox/depth_est/depth_anything_v2/Depth-Anything-V2/depth_anything_v2/dinov2_layers/patch_embed.py:76, in PatchEmbed.forward(self, x) 73 assert H % patch_H == 0, f"Input image height {H} is not a multiple of patch height {patch_H}" 74 assert W % patch_W == 0, f"Input image width {W} is not a multiple of patch width: {patch_W}" ---> 76 x = self.proj(x) # B C H W 77 H, W = x.size(2), x.size(3) 78 x = x.flatten(2).transpose(1, 2) # B HW C

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/module.py:1532, in Module._wrapped_call_impl(self, *args, kwargs) 1530 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1531 else: -> 1532 return self._call_impl(args, kwargs)

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/module.py:1541, in Module._call_impl(self, *args, *kwargs) 1536 # If we don't have any hooks, we want to skip the rest of the logic in 1537 # this function, and just call forward. 1538 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1539 or _global_backward_pre_hooks or _global_backward_hooks 1540 or _global_forward_hooks or _global_forward_pre_hooks): -> 1541 return forward_call(args, **kwargs) 1543 try: 1544 result = None

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/conv.py:460, in Conv2d.forward(self, input) 459 def forward(self, input: Tensor) -> Tensor: --> 460 return self._conv_forward(input, self.weight, self.bias)

File ~/workspace/anaconda3/envs/depth_anything_v2/lib/python3.10/site-packages/torch/nn/modules/conv.py:456, in Conv2d._conv_forward(self, input, weight, bias) 452 if self.padding_mode != 'zeros': 453 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode), 454 weight, bias, self.stride, 455 _pair(0), self.dilation, self.groups) --> 456 return F.conv2d(input, weight, bias, self.stride, 457 self.padding, self.dilation, self.groups)

RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same

LiheYoung commented 4 months ago

Hi, please refer to the updated doc: https://github.com/DepthAnything/Depth-Anything-V2#use-our-models

shenw000 commented 4 months ago

Hi, please refer to the updated doc: https://github.com/DepthAnything/Depth-Anything-V2#use-our-models

Thank you for the update! It works now.

kafkaGen commented 4 months ago

Hi! Caught the same problem when set DEVICE = 'cpu'. The error shows that input preprocessed image on CUDA, but model not, so got different type of tensor (as i got it). Thanks for you link to README, but I do not found what to change to make it run correct on CPU. Could you please help with this?

shenw000 commented 4 months ago

There are two lines changed from "old" version of README:

DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

model = model.to(DEVICE).eval()

It works in my case...

On Fri, Jul 5, 2024 at 9:20 AM kafkaGen @.***> wrote:

Hi! Caught the same problem when set DEVICE = 'cpu'. The error shows that input preprocessed image on CUDA, but model not, so got different type of tensor (as i got it). Thanks for you link to README, but I do not found what to change to make it run correct on CPU. Could you please help with this?

— Reply to this email directly, view it on GitHub https://github.com/DepthAnything/Depth-Anything-V2/issues/45#issuecomment-2210868774, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGTFGGKW7H5N2EISQC3R63ZK2MTJAVCNFSM6AAAAABKEQE436VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQHA3DQNZXGQ . You are receiving this because you authored the thread.Message ID: @.***>