Open lszhou0126 opened 1 month ago
just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it
just change the 'sample_size' to a value longer than 47s on model_config and you should be able to do it
@kenkalang Thank you for your response. While the length has indeed increased, the audio quality seems to have deteriorated, and the conditions for seconds_start and seconds_total are not functioning as expected. The pre-trained model had a duration of 47 seconds, so it seems that a direct modification like this might not be appropriate.
yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality
yeah, you should also fine tune the pre trained model with your dataset if you want it to have better quality
@kenkalang If fine-tuning is to be performed, aside from DiT, does the Autoencoder part of the network also require fine-tuning?
As the autoencoder is fully convolution-based, I think it does not need any fine-tuning
As the autoencoder is fully convolution-based, I think it does not need any fine-tuning
yeah, just tune the DiT that's where the most impact happens
Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.
Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.
@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?
Look up the multi diffusion paper, should work fine with this to generate arbitrary length music.
@zaptrem Are you referring to the "Fusing Diffusion Paths for Controlled Image Generation" paper when you mention multi diffusion? Is there any other application related to music?
Page 49 and 61 of Movie Gen they use it for their audio accompaniment model: https://ai.meta.com/static-resource/movie-gen-research-paper
As the title described. Can continuation methods be used?