Quality of the generated music with MAGNeT is worst than the one on the website and even from MusicGen.

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.15k stars 2.01k forks source link

I also had the same impression on MAGNeT and found that an author answered why MAGNeT performance is worse than the audio samples in demo page here. [About Magnet‘s performance] : #395

As the author mentioned, "rescoring technique using MusicGen" (they haven't provided in this repository) should improve performance at inference time. And what we get now is the non-rescoring version in Table 3 in the paper.

In addition to this, I'm thinking the following based on the paper..

MAGNeT is faster, but the performance is still worse than MusicGen originally (Table 1).
The current metrics of music generative model (FAD, CLAP score, ets..) are not perfect. It seems that they can be easily cheated and the metrics value does't reflect the generation quality directly.

facebookresearch / audiocraft

Quality of the generated music with MAGNeT is worst than the one on the website and even from MusicGen. #440