Open AhmedHussKhalifa opened 3 weeks ago
Hi, thanks for your interest in this repo and our work.
Yes, using ImageNet mean and std is intended. Note that we also resized images to 224x224 for all these downstream datasets. Using either ImageNet or CIFAR mean and std should not make much difference in training.
Hi,
Thank you for your response.
Could you please clarify the specific reason for scaling the images to 224x224? Is it primarily to increase the receptive field for the ViT model? Also, is there any literature where the input size was kept the same as the original (32x32) for ViT models, and if so, how did it impact performance?
Hey,
I am attempting to reproduce the random initialization results for the Pets and Flowers datasets using the ViT-tiny model as a baseline. However, my results only reach 44.90%, which is significantly lower than the reported 62.4% on the Flowers dataset.
Hi,
For pets and flowers, please directly load from pytorch dataset instead of using the oxford...py
.
Your settings look correct. We rerun the experiment you mentioned, and we can successfully reproduce using the command you provided.
Hi, and thanks for sharing your code! I have a quick question regarding the preprocessing step in this line. I noticed that you’re applying the same ImageNet preprocessing to CIFAR100. Since you’ve used this consistently across your experiments, I’m considering using your setup as a benchmark for my ViT model. Could you please confirm if this is indeed the intended approach for your experiments?
Thanks in advance!