Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Hi!
Every neural codec trained in the last year (Soundstream, Encodec, Descript) seems to be using RVQ codebook dimensionality of 1024. For the purposes of training MusicGen, dimensionality of 2048 was used in the 32 kHz Encodec trained here. Why?
Hi! Every neural codec trained in the last year (Soundstream, Encodec, Descript) seems to be using RVQ codebook dimensionality of 1024. For the purposes of training MusicGen, dimensionality of 2048 was used in the 32 kHz Encodec trained here. Why?