Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
It's not clear to me whether we should pre-chunk audio or if the trainer does this at run time. Any ideas?