facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.23k stars 2.03k forks source link

offset for generate_continuation() #287

Open PhilippeNguyen opened 9 months ago

PhilippeNguyen commented 9 months ago

I'm finding that the results are a bit strange, I feel like too often the model will only continue the prompt audio for a few seconds then switch to a different song. I'm not sure what the cause is, my guess is that the position encoding is the cause.

For example, say I generate 60s of audio, then I like the 30s-35s segment, so I take that and feed that to the model to continue generation. The prompt segment will have position encoding as if it were 0s-5s, and the first 1s of newly generated audio will have position encoding as if it were 6s, and maybe because of this the model is not really beholden to the prompt and more willing to have a large shift in the music.

Does this make sense? If so, is there a way to control the offset for the position encoding when using generate_continuation()?