Open jedrzejwalega opened 3 days ago
First trial run for ConvNext here.
Lessons learned:
learning_rate
is way too big, around step = 15k we can see the moment ConvNext starts to harshly overfit in a giant drop of val_accuracy
epochs
is way too much. To launch a lot of experiments I'll go with 5-10 epochs
from now on. To find a more reasonable learning rate I'll launch a training with the model's recommended lr and implement fastai
lr_finder
for future usage, while the model trains.
Learning rate finder implementation here
Picked learning_rate = 0.001
for a shorter training (actually went for more epochs than assumed above, that's due to the training launching before the decision was made). Results here.
The learning_rate = 0.0004
it started with seemed to work well for the model, so I've followed up with another run with this particular learning_rate
here.
Major observation: scheduler seems to be broken. It's supposed to be a OneCycleLR scheduler, yet it does not do full cycles. Need to fix before movign forward.
Scheduler fixed, can progress with the fine-tuning.
Now that we've reproduced the authors' ResNet18 results (mostly to confirm the
lightning
andneptune
framework works same way as theirskorch
) we want to try more recent models than ResNet.The goal is to fine-tune ConvNext (pretrained can be taken from
timm
) on our dataset).We are benchmarking against the authors' val F1 = 0.91.