Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k
stars
2.01k
forks
source link
mbd training is not working with custom dataset (vs. mbd_musicgen_32khz.th) #445
Hi team, I have questions regarding the current mbd training code. Several issues came up -
1) encodec_24khz.yaml
I am using config/solver/diffusion/encodec_24khz.yaml file with my own dataset. I have had decent results with encodec and musicgen training before but for some reason, samples from mbd training are not the music at all. Someone raised a similar issue like #430
2) load_diffusion_model in models/loaders.py
I tried to use API examples from the mdb.md doc file but I realized that my torch pth model is different than expected. For example, in loaders.py, it tries to load n_bandsmodel_stateprocessor_state in the dictionary but there is no saved value for those in my pth file. But I found that processor_state can be corresponding to sample_processor in my pth model file, but that was my guess.
And now I can see the correct dict structure which can be used with the current loaders.py. So that's why I suspect the current training code for mbd might not be up-to-date
In this yaml file it shows both processor and filter uses are "false". Is this correct?
but in your pth file I can see multi_band_processor value shows both true and false.
These are my findings regarding current mbd status.
Please take a look and any information would be appreciated.
Hi team, I have questions regarding the current mbd training code. Several issues came up -
1) encodec_24khz.yaml
I am using config/solver/diffusion/encodec_24khz.yaml file with my own dataset. I have had decent results with encodec and musicgen training before but for some reason, samples from mbd training are not the music at all. Someone raised a similar issue like #430
2) load_diffusion_model in models/loaders.py
I tried to use API examples from the mdb.md doc file but I realized that my torch pth model is different than expected. For example, in loaders.py, it tries to load
n_bands
model_state
processor_state
in the dictionary but there is no saved value for those in my pth file. But I found thatprocessor_state
can be corresponding tosample_processor
in my pth model file, but that was my guess.3) Difference from mbd_musicgen_32khz.th
So I downloaded your original mbd pretrained model from 'https://dl.fbaipublicfiles.com/encodec/Diffusion/mbd_musicgen_32khz.th' and checked what's inside.![preview](https://github.com/facebookresearch/audiocraft/assets/61261760/1a9410db-afb3-4885-866f-5f2fd422f233)
And now I can see the correct dict structure which can be used with the current loaders.py. So that's why I suspect the current training code for mbd might not be up-to-date
4) diffusion/default.yaml
https://github.com/facebookresearch/audiocraft/blob/main/config/solver/diffusion/default.yaml#L76
In this yaml file it shows both
processor
andfilter
uses are "false". Is this correct? but in your pth file I can seemulti_band_processor
value shows both true and false.These are my findings regarding current mbd status. Please take a look and any information would be appreciated.
Thank you very much in advance.