facebookresearch / hiera

Hiera: A fast, powerful, and simple hierarchical vision transformer.
Apache License 2.0
717 stars 36 forks source link

Getting error with Hiera Inference (Image) #11

Closed sudhir2016 closed 1 year ago

sudhir2016 commented 1 year ago

RuntimeError: The size of tensor a (56) must match the size of tensor b (96) at non-singleton dimension 2

What am I doing wrong ?

dbolya commented 1 year ago

Hello, are you running inference with 224x224 images as described in inference.ipynb? Could you post an example of your code so I can try reproducing it?

sudhir2016 commented 1 year ago

Here is my code.

! pip install torch hiera-transformer import torch from PIL import Image from torchvision import transforms from torchvision.transforms.functional import InterpolationMode from timm.data.constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD import hiera model = hiera.hiera_base_224(pretrained=True, checkpoint="mae_in1k_ft_in1k") input_size = 224 transform_list = [ transforms.Resize(int((256 / 224) * input_size), interpolation=InterpolationMode.BICUBIC), transforms.CenterCrop(input_size) ] transform_vis = transforms.Compose(transform_list) transform_norm = transforms.Compose(transform_list + [ transforms.ToTensor(), transforms.Normalize(IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD), ]) img = Image.open("/content/dog.jpg") img1=img.resize((224,224)) img_vis = transform_vis(img1) img_norm = transform_norm(img1) output = model(img_norm)

dbolya commented 1 year ago

output = model(img_norm)

The model expects a batch dimension. At this point, img_norm is of shape [3, 224, 224] with no batch dimension. You need to pass in img_norm[None, ...] to add a batch dimension (turns it into shape [1, 3, 224, 224]). The correct line of code is the following:

output = model(img_norm[None, ...])

(as shown in inference.ipynb)

sudhir2016 commented 1 year ago

Thank you so much Daniel !!. Works fine now.