Why the patchembedding defaut img size is not equal to the image size in visualize attention?

facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO

Apache License 2.0

6.06k stars 885 forks source link

Why the patchembedding defaut img size is not equal to the image size in visualize attention? #261

Open LWShowTime opened 7 months ago

LWShowTime commented 7 months ago

n visualize_attention.py: https://github.com/facebookresearch/dino/blob/7c446df5b9f45747937fb0d72314eb9f7b66930a/visualize_attention.py#L108 However, in vision_transformer.py: https://github.com/facebookresearch/dino/blob/7c446df5b9f45747937fb0d72314eb9f7b66930a/vision_transformer.py#L116-L122 Will this cause any performance drop?

LWShowTime commented 7 months ago

I notice in DINO, your team have delete this line from the origin ViT: assert H == self.img_size[0], f"Input image height ({H}) doesn't match model ({self.img_size[0]}).

@piotr-bojanowski @mathildecaron31