facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k stars 2.01k forks source link

Quality of the generated music with MAGNeT is worst than the one on the website and even from MusicGen. #440

Open tachev opened 3 months ago

tachev commented 3 months ago

I love how quickly it creates the music.

What am I doing wrong?

I just run the samples that were provided and played with the settings, but no luck to get even close to the advertised quality. The sound is good, but the music is very strange. I tried with changing the params, but that made it worst. model.set_generation_params( use_sampling=True, top_k=0, top_p=0.9, temperature=3.0, max_cfg_coef=10.0, min_cfg_coef=1.0, decoding_steps=[int(20 * model.lm.cfg.dataset.segment_duration // 10), 10, 10, 10], span_arrangement='stride1' )

medium is a little bit better but still not as close to anything that I heard from other models

yukara-ikemiya commented 3 months ago

I also had the same impression on MAGNeT and found that an author answered why MAGNeT performance is worse than the audio samples in demo page here. [About Magnet‘s performance] : #395

As the author mentioned, "rescoring technique using MusicGen" (they haven't provided in this repository) should improve performance at inference time. And what we get now is the non-rescoring version in Table 3 in the paper.

In addition to this, I'm thinking the following based on the paper..

  1. MAGNeT is faster, but the performance is still worse than MusicGen originally (Table 1).
  2. The current metrics of music generative model (FAD, CLAP score, ets..) are not perfect. It seems that they can be easily cheated and the metrics value does't reflect the generation quality directly.