Verg-Avesta / CounTR

CounTR: Transformer-based Generalised Visual Counting
https://verg-avesta.github.io/CounTR_Webpage/
MIT License
92 stars 9 forks source link

Input image size doesn't match model #18

Closed GioFic95 closed 1 year ago

GioFic95 commented 1 year ago

Hi @Verg-Avesta, while running FSC_test_cross(zero-shot).py with images of size (690, 1280), i got the error in the title:

Traceback (most recent call last):
  File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 420, in <module>
    main(args)
  File "/home/gficarra/Code/countr/FSC_test_cross(zero-shot).py", line 318, in main
    output, = model(samples[:, :, :, start:start + 384], boxes, 0)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gficarra/Code/countr/models_mae_cross.py", line 198, in forward
    latent = self.forward_encoder(imgs)
  File "/home/gficarra/Code/countr/models_mae_cross.py", line 133, in forward_encoder
    x = self.patch_embed(x)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/gficarra/anaconda3/envs/countr/lib/python3.9/site-packages/timm/models/layers/patch_embed.py", line 32, in forward
    assert H == self.img_size[0] and W == self.img_size[1], \
AssertionError: Input image size (960*384) doesn't match model (384*384).

If I'm not mistaken, the sliding window moves horizontally along the width of the image, while the height is fixed to 384. So, I changed lines 128 and 129 from

new_H = 16*int(H/16)
new_W = 16*int(W/16)

to

new_H = 384
new_W = 16 * int((W / H * 384) / 16)

like in lines 28 and 29 in demo.py.

Maybe you could check if it makes sense, and if would be useful to apply this change. Thank you.

Verg-Avesta commented 1 year ago

Yes, in inference it will be correct to do so.

GioFic95 commented 1 year ago

Ok perfect, thanks.