Fine-tune ViT with LoRA

jedrzejwalega commented 3 days ago

Now that we've recreated the authors' ResNet18 results (mostly to confirm the lightning and neptune framework works same way as their skorch) we want to try more recent models than ResNet.

The goal is to fine-tune ConvNext (pretrained can be taken from timm) on our dataset).

We are benchmarking against the authors' val F1 = 0.91.

We would like to use LoRA for this. The choice of ViT is not the most important part in this experiment (we could try it with some other nn), but we want to check whether LoRA would be suitable for our modest (2.9k train) set of images available.

I'm going to learn perf library to apply LoRA onto ViT.

jedrzejwalega commented 3 days ago

Fine-tuning code
Model implementation wit LoRA

jedrzejwalega commented 3 days ago

First, short run with learning_rate = 0.0004, same as the one we use for ConvNext.

Doesn't work so well. Discussed with @Peterdes. I'm going to apply a few enhancements to my code:

Reducing LoRA rank to 8 as the original paper recommends for models of this size
Turning on rslora (explained)
In my LoRA implementation I've applied it to not just all attention weights, but also intermediate layers and output one. That might be too much, so I'll go with just query and value. I don't want to freeze especially the output layer, as it's getting replaced by us providing a custom number of classification classes, so it needs to train from scratch.

jedrzejwalega commented 3 days ago

10 epoch run seemed too short to conclude, I'm letting it run for 50.

Appsilon / image_flow_cytometry_fine_tune

Fine-tune ViT with LoRA #8