offset for generate_continuation()

I'm finding that the results are a bit strange, I feel like too often the model will only continue the prompt audio for a few seconds then switch to a different song. I'm not sure what the cause is, my guess is that the position encoding is the cause.

For example, say I generate 60s of audio, then I like the 30s-35s segment, so I take that and feed that to the model to continue generation. The prompt segment will have position encoding as if it were 0s-5s, and the first 1s of newly generated audio will have position encoding as if it were 6s, and maybe because of this the model is not really beholden to the prompt and more willing to have a large shift in the music.

Does this make sense? If so, is there a way to control the offset for the position encoding when using generate_continuation()?

facebookresearch / audiocraft

offset for generate_continuation() #287