SamsungSAILMontreal / ghn3

Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]
https://arxiv.org/abs/2303.04143
MIT License
29 stars 3 forks source link

Need additional detail regarding few-shot learning experiment #2

Closed sorobedio closed 2 months ago

sorobedio commented 2 months ago

Can you explain the few-shot setting whose result are reported in Table 7. Transfer learning from ImageNet to few-shot CIFAR-10 and CIFAR-100 with 1000 training labels with 3 networks: ResNet50 (R-50), ConvNext-B (C-B) and Swin-T (S-T)

How many shots in the training set as well as in the test set or query set.

bknyaz commented 2 months ago

CIFAR-100 has 100 classes, so with 1000 training labels we have 10 shots. We didn't modify the test set, i.e. it's 10k for CIFAR-10 and CIFAR-100.

We used the code from our previous paper to run these fine-tuning experiments (https://github.com/facebookresearch/ppuda/blob/main/experiments/sgd/train_net.py), so we pass --n_shots 10 for CIFAR-100 experiments. The hyperparameters are based on our another paper Pretraining a Neural Network before Knowing Its Architecture (https://arxiv.org/abs/2207.10049), see Table 2 there. For CIFAR-100 the best hyperparameters are generally similar. We used --beta 3e-5 to add some noise to predicted parameters before their fine-tuning, but we didn't use orthogonal re-initialization (i.e. we use --init rand).

Let me know if there are further questions.