Finetune CLAP on {audio, text} pairs

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

https://arxiv.org/abs/2211.06687

Creative Commons Zero v1.0 Universal

1.43k stars 137 forks source link

Finetune CLAP on {audio, text} pairs #141

Open jerpint opened 9 months ago

jerpint commented 9 months ago

Hello!

Suppose I have a dataset of {audio, text} pairs. I would now like to finetune CLAP on this audio subset. Do you have any tips for getting started with such a task? Would continuing the training from a checkpoint with a smaller learning rate be somewhat of a good start? Do you have scripts that allow to do something similar?

Thanks

lukewys commented 7 months ago

Please see https://github.com/LAION-AI/CLAP?tab=readme-ov-file#dataset-format for details on the dataset format that we trained on. I think you can refer to the training script for fine-tuning, but remember to modify the learning rate and weight initialization.