faustomorales / vit-keras

Keras implementation of ViT (Vision Transformer)
Apache License 2.0
329 stars 78 forks source link

different image size in fine-tuning #29

Closed captainst closed 2 years ago

captainst commented 2 years ago

Hi there,

I saw the implementation using a convolution to generate fixed size hidden vector from a variable size of input image. That's brilliant! However, I am wondering if the fine-tuning result would be degradated, using a different input image size, say, 224, rather than the official input size, 384, as shown in your example.

Many thanks !

SaadTazroute commented 2 years ago

I will wait for a response :smile:

sutummala commented 2 years ago

Based on my experience, I did not see any downgrade performance at 224 resolution compared with 384

faustomorales commented 2 years ago

Closing as this is a question about modeling generally, suitable for research discussion in the original research repository, and not a problem with the code in this repository.