pretraining Hubert on Music Data

❓ Questions and Help

What is your question?

Hi! currently, i'm trying to pretrain a hubert base model on music data. The dataset consists of about 1000 Songs of different genres. I followed the instructions on https://github.com/facebookresearch/fairseq/blob/main/examples/hubert/README.md as closely as possible and took the hubert_base_librispeech default as config, except some minor changes concerning the distributed computing stuff. I'm aware, that the parameters of the default config might not be appropriate for music data. Nonetheless I hoped I could reach a baseline that I could improve further. Sadly I got the following behaviour: loss

loss_scale

The loss went down from over 10 to about 7 but jumped up to a higher value again, where it stayed. I can tell that it stays there because there was another longer run, that I've deleted already. Does anyone have an Idea where this problem might come from? I included the screenshot of the loss scale, because it looked suspicious

Thanks in advance

What's your environment?

fairseq Version (e.g., 1.0 or main): 0.12.2
PyTorch Version (e.g., 1.0): 2.2.1
OS (e.g., Linux): Fedora Linux release 39
How you installed fairseq (pip, source): pip
Python version: 3.10
CUDA version: 12.4 in host, but pytorch shipped with 12.1
cuDNN version: same as above. Probably the one pytorch 2.2.1 cuda 12.1 shipped with
GPU models and configuration: Training was using 1 RTX 3090. The System has 8 RTX3090 though
Any other relevant information: not that i know of. Let me know if you need to know something else

facebookresearch / fairseq