Weird Encodec codebook dimensionality?

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.23k stars 2.03k forks source link

Weird Encodec codebook dimensionality? #294

Open vican9000 opened 9 months ago

vican9000 commented 9 months ago

Hi! Every neural codec trained in the last year (Soundstream, Encodec, Descript) seems to be using RVQ codebook dimensionality of 1024. For the purposes of training MusicGen, dimensionality of 2048 was used in the 32 kHz Encodec trained here. Why?