facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.15k stars 2.01k forks source link

MusicGen text encoder substitution #449

Closed TWTom041 closed 2 months ago

TWTom041 commented 2 months ago

Question

How to apply my text encoder into the MusicGen model?

My Understanding of the MusicGen Model

In case my question is unclear due to my understanding of the model is wrong, please correct me.

  1. Input the text into a T5 model, and save its hidden state.
  2. Put the hidden state we just got into a Transformer, and we get EnCodec tokens.
  3. Feed the Encodec tokens to the Encodec Decoder, and we get music.

What I want to do

Replace the T5 encoder with what I trained, supposing that the feature is still the same for the same music target.

TWTom041 commented 2 months ago

For anyone who wants to do that, I think the answer is to modify the T5EncoderModel loading function here