facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k stars 2.01k forks source link

mbd training is not working with custom dataset (vs. mbd_musicgen_32khz.th) #445

Closed eunkoh closed 2 months ago

eunkoh commented 2 months ago

Hi team, I have questions regarding the current mbd training code. Several issues came up -

1) encodec_24khz.yaml

I am using config/solver/diffusion/encodec_24khz.yaml file with my own dataset. I have had decent results with encodec and musicgen training before but for some reason, samples from mbd training are not the music at all. Someone raised a similar issue like #430

2) load_diffusion_model in models/loaders.py

I tried to use API examples from the mdb.md doc file but I realized that my torch pth model is different than expected. For example, in loaders.py, it tries to load n_bands model_state processor_state in the dictionary but there is no saved value for those in my pth file. But I found that processor_state can be corresponding to sample_processor in my pth model file, but that was my guess.

3) Difference from mbd_musicgen_32khz.th

So I downloaded your original mbd pretrained model from 'https://dl.fbaipublicfiles.com/encodec/Diffusion/mbd_musicgen_32khz.th' and checked what's inside. preview

And now I can see the correct dict structure which can be used with the current loaders.py. So that's why I suspect the current training code for mbd might not be up-to-date

4) diffusion/default.yaml

https://github.com/facebookresearch/audiocraft/blob/main/config/solver/diffusion/default.yaml#L76

In this yaml file it shows both processor and filter uses are "false". Is this correct? but in your pth file I can see multi_band_processor value shows both true and false.

These are my findings regarding current mbd status. Please take a look and any information would be appreciated.

Thank you very much in advance.