enlyth / sd-webui-riffusion

Riffusion extension for AUTOMATIC1111's SD Web UI
MIT License
195 stars 23 forks source link

Duration #22

Open Tobe2d opened 1 year ago

Tobe2d commented 1 year ago

Is there a way to set duration? like if I want it to be 30sec, or even 1 minutes.

azumukupoe commented 1 year ago

https://github.com/enlyth/sd-webui-riffusion#prompt-travelling

Tobe2d commented 1 year ago

@azumukupoe Cant see where to set duration in there. All I see in there is only joined multiple files togather.

jahu00 commented 1 year ago

Riffusion can produce samples that are about 5 seconds in length. This can be extended to about 20 seconds by increasing the width of the image generated by the network to 2048. However, those samples with increased length won't be much better than joining multiple shorter samples together and they will tend to loose rhythm every 5 seconds or so. I'm guessing Riffusion was trained on images with the width of 512, hence the inability to produce seamless samples that are longer than 5 seconds. You would need a much bigger model and a powerful GPU to produce longer samples and coherent whole songs with a diffusion network.

Another approach, would be to divide the song production process into steps and train a network to perform each step. For example, one step could produce a midi approximation of the a song, another convert this midi approximation into actual audio.