Need additional detail regarding few-shot learning experiment

CIFAR-100 has 100 classes, so with 1000 training labels we have 10 shots. We didn't modify the test set, i.e. it's 10k for CIFAR-10 and CIFAR-100.

We used the code from our previous paper to run these fine-tuning experiments (https://github.com/facebookresearch/ppuda/blob/main/experiments/sgd/train_net.py), so we pass --n_shots 10 for CIFAR-100 experiments. The hyperparameters are based on our another paper Pretraining a Neural Network before Knowing Its Architecture (https://arxiv.org/abs/2207.10049), see Table 2 there. For CIFAR-100 the best hyperparameters are generally similar. We used --beta 3e-5 to add some noise to predicted parameters before their fine-tuning, but we didn't use orthogonal re-initialization (i.e. we use --init rand).

Let me know if there are further questions.

SamsungSAILMontreal / ghn3

Need additional detail regarding few-shot learning experiment #2