Cyan0731 / MusiConGen

MIT License
62 stars 13 forks source link

about training weight #5

Open MahlerMozart opened 1 month ago

MahlerMozart commented 1 month ago

Hello, great work! one question about the training weight of MusiConGen provided at the following link, is it the weight of MusicGen-melody(1.5B) or the checkpoint of MusiConGen? Thank you! https://huggingface.co/Cyan0731/MusiConGen_training/tree/main

Cyan0731 commented 1 month ago

Hi, the link is the checkpoint of MusiConGen. You can also modify the MusicGen-melody checkpoint to train with your own dataset by aligning the key and weight shape with the provided checkpoint.

MahlerMozart commented 1 month ago

Hi, the link is the checkpoint of MusiConGen. You can also modify the MusicGen-melody checkpoint to train with your own dataset by aligning the key and weight shape with the provided checkpoint.

Thank you for the fast reply! Could you please elaborate more on the details of the checkpoint and training process? Here is what I have found.

  1. I tried to use export_weight.py to convert the training checkpoint but the output is different from the weights provided here https://huggingface.co/Cyan0731/MusiConGen/tree/main
  2. Is the checkpoint a modified version from the MusicGen-melody weights as the initial point to finetune or some middle point during the training on the data from 5_genre_songs_list.json?
  3. If the checkpoint is not from MusicGen-melody weights, will the program automatically load MusicGen-melody weights as the starting point to finetune if I use the command "compression_model_checkpoint=//pretrained/facebook/encodec_32khz"?
  4. I tried to train the model following your instruction with the data from 5_genre_songs_list.json. If I continue_from the checkpoint at https://huggingface.co/Cyan0731/MusiConGen_training/tree/main, the CE loss will quickly drop to around 3.5 but if I do not continue from the checkpoint, the CE loss never drop below 5.5 even after several hundreds of epochs.

I am very interested in your work and truly appreciate your patience and help on the above questions. Thank you!

Cyan0731 commented 2 weeks ago

Sorry for the late reply.

  1. Can you show the difference of the output weight format?

  2. The training checkpoint is trained checkpoint from modified version of MusicGen-melody model. The inference checkpoint is the exported checkpoint(bin file) from training checkpoint.

  3. No, before your own training, you have to modify some layers of pretrained MusicGen-melody checkpoint corresponding to MusiConGen training weight at https://huggingface.co/Cyan0731/MusiConGen_training/tree/main

  4. Yes, in my experiment, the training model will converge at when CE loss is about 3.2. The validation loss is about 3.5. However, to determine whether the model is well-trained, listening test is the most accurate evaluation.