facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.17k stars 2.01k forks source link

About Magnet‘s performance #395

Open RevolGMPHL opened 5 months ago

RevolGMPHL commented 5 months ago

Why is the performance so poor? Is there a bug, or is the model itself just this poor in performance?

CyberTimon commented 5 months ago

Do you mean sound quality or speed? If sound quality, yes I was expecting better too.

lonzi commented 5 months ago
CyberTimon commented 5 months ago

Thanks @lonzi for the answer! Will try out these tips tomorrow.

RevolGMPHL commented 5 months ago

thanks for the answer~ Although I tried several parameters, I still feel the performance is not as good as the original model.

CyberTimon commented 5 months ago

Can you @lonzi kindly provide us your sampling parameters or add a note in the readme with recommended paramteres?

Thank you so much!

lonzi commented 5 months ago

For Music: span_arrangement: 'stride1' use_sampling: true top-p: 0.9 temperature: 3.0 max_cfg_coef: 10.0 min_cfg_coef: 1.0 decoding_iterations (for 10 secs): [20, 10, 10, 10] decoding_iterations (for 30 secs): [60, 10, 10, 10]

See our paper for the ablation studies.

For Sound [audio-magnet models]: span_arrangement: 'stride1' use_sampling: true top-p: 0.8 temperature: 3.5 max_cfg_coef: 20.0 min_cfg_coef: 1.0 decoding_iterations: [20, 10, 10, 10]

*these are the parameters from the paper ablation studies, not necessarily tuned for the open-source models