Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Question
How to apply my text encoder into the MusicGen model?
My Understanding of the MusicGen Model
In case my question is unclear due to my understanding of the model is wrong, please correct me.
What I want to do
Replace the T5 encoder with what I trained, supposing that the feature is still the same for the same music target.