The sigma_t is not samped from 0 to 1 in v-diffusion, which is not like your thesis mentioned, will it cause any trouble?
By sampling a random σt ∈ [0,1], we are more likely to pick a value that resembles x x x0 instead of pure noise ε meaning that the model will more often see data with smaller amount of noise
That's the actual noise. Sigma is the noise level. Check the code 2 lines above the noise variable, it's sampled from the sigma distribution which is uniform in the range [0,1].
https://github.com/archinetai/audio-diffusion-pytorch/blob/eafa972e27d332ec6f53dd616ac9a0cd466fc42f/audio_diffusion_pytorch/diffusion.py#L85
The sigma_t is not samped from 0 to 1 in v-diffusion, which is not like your thesis mentioned, will it cause any trouble?
By sampling a random σt ∈ [0,1], we are more likely to pick a value that resembles x x x0 instead of pure noise ε meaning that the model will more often see data with smaller amount of noise