How to generate audio variations and style transfer of audio samples?

Stability-AI / stable-audio-tools

Generative models for conditional audio generation

MIT License

2.68k stars 254 forks source link

How to generate audio variations and style transfer of audio samples? #71

Open Gitterman69 opened 5 months ago

Gitterman69 commented 5 months ago

Now that we have the model released, we got a code snippet for inference but how do we audio variations and style transfer of audio samples? It would be amazing if you guys could help me out with this one to get it running locally.... THanks so much!

smrl commented 4 months ago

You can run the gradio app, check the "init audio" section -- adding init audio there in conjunction with the iniit noise level with no prompt should yield "variations" while adding a prompt alongside the init audio will give you "style transfer". If you're looking for how to implement the code for it you can trace it through the gradio app to see what they're doing. I'm not finding it works great, if anyone's found any success with this or has any other approaches I'd love to know as well.

GenjiB commented 4 months ago

@smrl It seems that they didn't do inversion for the initial audio. If you did, it should get much better results.