Inquiry Regarding Image Preprocessing in CIFAR100 Experiments

OscarXZQ / weight-selection

164 stars 11 forks source link

Inquiry Regarding Image Preprocessing in CIFAR100 Experiments #6

Open AhmedHussKhalifa opened 3 weeks ago

AhmedHussKhalifa commented 3 weeks ago

Hi, and thanks for sharing your code! I have a quick question regarding the preprocessing step in this line. I noticed that you’re applying the same ImageNet preprocessing to CIFAR100. Since you’ve used this consistently across your experiments, I’m considering using your setup as a benchmark for my ViT model. Could you please confirm if this is indeed the intended approach for your experiments?

Thanks in advance!

OscarXZQ commented 3 weeks ago

Hi, thanks for your interest in this repo and our work.

Yes, using ImageNet mean and std is intended. Note that we also resized images to 224x224 for all these downstream datasets. Using either ImageNet or CIFAR mean and std should not make much difference in training.

AhmedHussKhalifa commented 3 weeks ago

Hi,

Thank you for your response.

Could you please clarify the specific reason for scaling the images to 224x224? Is it primarily to increase the receptive field for the ViT model? Also, is there any literature where the input size was kept the same as the original (32x32) for ViT models, and if so, how did it impact performance?

AhmedHussKhalifa commented 2 weeks ago

Hey,

I am attempting to reproduce the random initialization results for the Pets and Flowers datasets using the ViT-tiny model as a baseline. However, my results only reach 44.90%, which is significantly lower than the reported 62.4% on the Flowers dataset.

OscarXZQ commented 2 weeks ago

Hi,

For pets and flowers, please directly load from pytorch dataset instead of using the oxford...py.

Your settings look correct. We rerun the experiment you mentioned, and we can successfully reproduce using the command you provided.