Closed SoshyHayami closed 7 months ago
Thank you very much; I still wish you'd include it even if it wasn't satisfying; I was desperately looking for a similar feature and the only working one I found was Meta's Audio Gen, which is notoriously hard to train.
Regarding the VAE, the sample rate is 16khz, should I increase and re-train again? what do you suggest? and if i change the num_mel_bins, wouldn't that conflict with BIGVGAN? tbh I have a single or 2x V100 (32gb), AudioLDM made me hopeful on how far I can push this, I hope I can train on this config since this project seem very interesting.
I have updated audio2audio.py. If you want to use a higher sample rate and update the melspec process, you need to retrain the VAE, and you also need to retrain BIGVGAN too. It will be time costing. If you are making research, better not change the sample rate since you need to compare with previous works.
Hi, I've seen that you showed an example personalizing an audio sfx in your demo samples page. Can you tell me how to implement this with the inference code you have provided here?
Also, do you think it's worth if We train on an entire music dataset for Music Generation task or is it only for sound effects and perhaps some light music generation? what steps should we take if we want to train on a higher sample rate ? (let's say 32k or 48k or perhaps even stereo)
Thanks.