Closed sorobedio closed 2 months ago
CIFAR-100 has 100 classes, so with 1000 training labels we have 10 shots. We didn't modify the test set, i.e. it's 10k for CIFAR-10 and CIFAR-100.
We used the code from our previous paper to run these fine-tuning experiments (https://github.com/facebookresearch/ppuda/blob/main/experiments/sgd/train_net.py), so we pass --n_shots 10
for CIFAR-100 experiments. The hyperparameters are based on our another paper Pretraining a Neural Network before Knowing Its Architecture (https://arxiv.org/abs/2207.10049), see Table 2 there. For CIFAR-100 the best hyperparameters are generally similar. We used --beta 3e-5
to add some noise to predicted parameters before their fine-tuning, but we didn't use orthogonal re-initialization (i.e. we use --init rand
).
Let me know if there are further questions.
Can you explain the few-shot setting whose result are reported in Table 7. Transfer learning from ImageNet to few-shot CIFAR-10 and CIFAR-100 with 1000 training labels with 3 networks: ResNet50 (R-50), ConvNext-B (C-B) and Swin-T (S-T)
How many shots in the training set as well as in the test set or query set.