crlandsc / tiny-audio-diffusion

A repository for generating and training short audio samples with unconditional waveform diffusion on accessible consumer hardware (<2GB VRAM GPU)
https://towardsdatascience.com/tiny-audio-diffusion-ddc19e90af9b
MIT License
152 stars 15 forks source link

Conditional training #3

Closed fmiotello closed 4 months ago

fmiotello commented 4 months ago

I've seen in the accompanying Jupyter notebook that, during inference, it is possible to generate samples conditioning the input with unseen audio data. Is it possible to also condition the training process with custom features, as is common in other diffusion models?

Thank you!

crlandsc commented 4 months ago

Hi @fmiotello, I am not entirely sure what you are asking. The inference is able to be "conditioned" on input audio because of how the diffusion process works. Effectively, it takes any sort of input and "moves" it into the region of known sounds. So whether the input is white noise, or it is a "conditional" input of some other audio (e.g. drum sound), it will always bring it into its known region.

Are you asking that if certain sounds can be used as inputs during the training process rather than just white noise the entire time? If so, I suppose this is possible, but may reduce the capabilities of the model.

If you are asking if it could be conditioned on something like text prompts (e.g. Stable Audio, MusicGen, etc.), it certainly could be. However, I have not built this capability into this codebase, but the tools to add this functionality should exist in audio-diffusion-pytorch, which is the library that this is built upon.

fmiotello commented 4 months ago

Yes, I was referring to the possibility of conditioning the training on other types of data rather than audio. I'll look into the audio-diffusion-pytorch library then. Thanks for your help!

crlandsc commented 4 months ago

No problem and good luck!