I am truly grateful to the authors for open-sourcing code

dominickrei / Limited-data-vits

[WACV 2024] Code for "Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders"

22 stars 1 forks source link

I am truly grateful to the authors for open-sourcing code #2

Closed newer7 closed 1 week ago

newer7 commented 4 months ago

I am truly grateful to the authors for open-sourcing code, which introduces a novel approach. However, I seem to have encountered an issue. For datasets like Flowers102, Chaoyang, PMNIST, ClipArt, Infograph, and Sketch, how should the validation set be partitioned? Is it the traditional 8:2 split, or is the validation set the same as the test set? Especially for Flowers102, there is a significant performance gap between using the ViT-T + SSAT method and the results reported in the paper. If you could provide some clarification, I would greatly appreciate it. Once again, thank you for the authors' contributions.

srijandas07 commented 4 months ago

Hi @newer7 All our hyperparams are set based on CIFAR and IN1K experiments. Also, we have constant # of epochs to train our models. So, we didn't use a val set for small datasets like Flowes102 (following the other smallViT paper's training protocols).

newer7 commented 4 months ago

Hi @newer7 All our hyperparams are set based on CIFAR and IN1K experiments. Also, we have constant # of epochs to train our models. So, we didn't use a val set for small datasets like Flowes102 (following the other smallViT paper's training protocols).

Thank you very much for your reply.

newer7 commented 3 months ago

Hi @newer7 All our hyperparams are set based on CIFAR and IN1K experiments. Also, we have constant # of epochs to train our models. So, we didn't use a val set for small datasets like Flowes102 (following the other smallViT paper's training protocols).

Thank you very much. Could you please provide some insights into the implementation details of CVT-SSAT and Swin-SSAT?the decoder design from [20] for ViT, and utilize the decoder design from ConvMAE [16] and SimMIM [59] for hierarchical encoders such as CVT and Swin, respectively. Is the only difference in the design of the decoder?

dominickrei commented 3 months ago

Hi @newer7, thank you for your interest. For CVT+SSAT and Swin+SSAT experiments we follow the same encoder-decoder designs proposed in ConvMAE for CVT, and SimMIM for Swin.