fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
218 stars 41 forks source link

Finetuning on own data #32

Open cvillela opened 3 weeks ago

cvillela commented 3 weeks ago

Hello again @fschmid56 , thanks for the awesome repo!

I would like to finetune DyMNs on my own dataset for audio classification. Is it possible?

If so, would the best pipeline be to just classify from the latent representations of the model using a shallow head? Or to fine-tune the whole model (or last layers) on a new dataset?

If finetuning the whole model, is it possible to freeze early layers and just adjust the last layer's weights? What would be your approach?

Thank you so much for your effort!

fschmid56 commented 2 weeks ago

Hi @cvillela ! Thanks for your interest.

If the dataset is not too small, I think fine-tuning the whole model would give the best results. You can, e.g., just look at this file and check how the AudioSet pre-trained models are fine-tuned on the FSD50K dataset.

If you have severe problems with overfitting, you could try to freeze some layers or only train a single linear layer as head, as you suggested. An even more elegant version, in my opinion, is a layer-wise learning rate decay so that layers closer to the output are fine-tuned with a higher learning rate, while early layers are fine-tuned with a very low learning rate. I implemented this a while ago and could provide the code snippet if you would like to have it.

cvillela commented 2 weeks ago

Awesome @fschmid56 ! Thanks for the reply. Will try different approaches! My dataset unfortunately shouldn't be too large.

Yes, I would love to take a look at that script, it does seem like an elegant solution!

fschmid56 commented 2 weeks ago

The code snippet below is similar to what I used recently. I would recommend to check it properly before using it, I modified it a bit for this post and spent only 2 minutes testing it :)

Let me know which variant works best for you.

def separate_params(model):
    pt_params = [[], [], [], [], [], [], []]
    for k, p in model.named_parameters():
        if k.startswith('in_c'):
            pt_params[0].append(p)
        elif k.startswith('out_c') or k.startswith('classifier'):
            pt_params[-1].append(p)
        elif k.startswith('layers.12.') or k.startswith('layers.13.') or k.startswith('layers.14.'):
            pt_params[5].append(p)
        elif k.startswith('layers.9.') or k.startswith('layers.10.') or k.startswith('layers.11.'):
            pt_params[4].append(p)
        elif k.startswith('layers.6.') or k.startswith('layers.7.') or k.startswith('layers.8.'):
            pt_params[3].append(p)
        elif k.startswith('layers.3.') or k.startswith('layers.4.') or k.startswith('layers.5.'):
            pt_params[2].append(p)
        elif k.startswith('layers.0.') or k.startswith('layers.1.') or k.startswith('layers.2.'):
            pt_params[1].append(p)
        else:
            raise ValueError("Check parameter separation for frame-dymn!")
    return list(reversed(pt_params))

def get_optimizer(
        model, lr=0.001, lr_decay=0.5
):
    sep_params = separate_params(model)

    scale_lrs = [lr * (lr_decay ** i) for i in range(0, len(sep_params))]
    param_groups = [{"params": sep_params[i], "lr": scale_lrs[i]} for i in range(len(sep_params))]
    return torch.optim.Adam(param_groups, lr=lr)

from models.dymn.model import get_model
dymn = get_model()
get_optimizer(dymn, lr=0.001, lr_decay=0.5)