archinetai / audio-diffusion-pytorch

Audio generation using diffusion models, in PyTorch.
MIT License
1.92k stars 167 forks source link

Question: the sigma_t is not samped from 0 to 1 in v-diffusion, which is not like your thesis mentioned, will it cause any trouble? #50

Closed emailandxu closed 1 year ago

emailandxu commented 1 year ago

https://github.com/archinetai/audio-diffusion-pytorch/blob/eafa972e27d332ec6f53dd616ac9a0cd466fc42f/audio_diffusion_pytorch/diffusion.py#L85

The sigma_t is not samped from 0 to 1 in v-diffusion, which is not like your thesis mentioned, will it cause any trouble?

By sampling a random σt ∈ [0,1], we are more likely to pick a value that resembles x x x0 instead of pure noise ε meaning that the model will more often see data with smaller amount of noise

flavioschneider commented 1 year ago

That's the actual noise. Sigma is the noise level. Check the code 2 lines above the noise variable, it's sampled from the sigma distribution which is uniform in the range [0,1].