any way to generate music longer than 47 seconds？

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

MIT License

2.73k stars 258 forks source link

any way to generate music longer than 47 seconds？ #154

Open lszhou0126 opened 1 month ago

lszhou0126 commented 1 month ago

As the title described. Can continuation methods be used?

kenkalang commented 1 month ago

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

lszhou0126 commented 1 month ago

just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it

@kenkalang Thank you for your response. While the length has indeed increased, the audio quality seems to have deteriorated, and the conditions for seconds_start and seconds_total are not functioning as expected. The pre-trained model had a duration of 47 seconds, so it seems that a direct modification like this might not be appropriate.

kenkalang commented 1 month ago

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

lszhou0126 commented 1 month ago

yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality

@kenkalang If fine-tuning is to be performed, aside from DiT, does the Autoencoder part of the network also require fine-tuning?

NZqian commented 1 month ago

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

kenkalang commented 1 month ago

As the autoencoder is fully convolution-based, I think it does not need any fine-tuning

yeah, just tune the DiT that's where the most impact happens

zaptrem commented 1 month ago

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

lszhou0126 commented 1 month ago

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

zaptrem commented 1 month ago

Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.

@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?

Page 49 and 61 of Movie Gen they use it for their audio accompaniment model: https://ai.meta.com/static-resource/movie-gen-research-paper