UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
369 stars 17 forks source link

torch.cat error in evaluation #19

Open lianqiann opened 5 months ago

lianqiann commented 5 months ago

I think the usage of torch.cat is wrong. And this would cause a runtime error

https://github.com/UX-Decoder/DINOv/blob/53bf20d5cfdbb86fa35141a1cff432d4923599f2/dinov/architectures/dinov.py#L1082

When running eval script for COCO2017 python train_net.py --eval_only --resume --eval_get_content_features --num-gpus 8 --config-file /path/to/configs COCO.TEST.BATCH_SIZE_TOTAL=8 MODEL.WEIGHTS=/path/to/weights OUTPUT_DIR=/path/to/outputs

This would cause error that

all_rand_shape = torch.cat([t['rand_shape'] for t in new_targets], 0)
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 800 but got size 896 for tensor number 1 in the list.

The tensors causing such error are in following shapes

torch.Size([30, 800, 1216])
torch.Size([2, 896, 800])
torch.Size([20, 800, 1088])
torch.Size([8, 1088, 800])

This issue will cause many other errors in many places, so fixing this alone cannot make the eval script run. Could anyone check on this? Thank you very much!

lianqiann commented 5 months ago

i think the error is from that the targets size were not unified for inputs with different sizes. This issue only happens in non-training mode.

FengLi-ust commented 3 months ago

Yes, in training we resize all images to a fixed size. For inference, using batch size 1 should work, otherwise you need to pad the images into the same size for batched inference.