benchopt / benchmark_resnet_classif

Benchopt benchmark for ResNet fitting on a classification task
https://benchopt.github.io/results/benchmark_resnet_classif.html
9 stars 4 forks source link

Tips and Tricks for training classification convolutional neural networks #11

Open zaccharieramzi opened 2 years ago

zaccharieramzi commented 2 years ago

Data augmentation:

Regularization:

Learning rate:

Modeling (to me these ones are out of our scope):

Other:

zaccharieramzi commented 2 years ago

The highest prio to me is:

@tomMoral @pierreablin wdyt?

zaccharieramzi commented 2 years ago

RE LR scheduling: there exists a big difference in how TF and PL implement it. Basically, TF implements it on a per-optimizer-step level, and PL implements it on a per-epoch level (see here), which is similar that is done here or in timm.

I might just go with the PL way of doing it, since it's the least flexible.

zaccharieramzi commented 2 years ago

RE LR scheduling / Weight Decay: I am not sure what is the canonical way of updating the weight decay given the lr schedule.

In TF, it is specified that it should be updated, and in this case manually:

Note: when applying a decay to the learning rate, be sure to manually apply the decay to the weight_decay as well. For example:

But in PL or torch, I didn't see any mention of this update, so it might not be used. I am going to verify this.

EDIT

Ok so the problem with WD is actually the following, and I understood it reading the original decoupled weight decay paper as well as the docs of Adam and AdamW in PyTorch. There exists 2 ways of applying the weight decay:

I will call both types clearly in the solvers. We can still have coupled weight decay for PyTorch and TensorFlow, but for TensorFlow, the problem is that we need to hack it in a bit of an ugly way... I will make a proposal and we will see.