NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.29k stars 530 forks source link

Why do we have randomness when inferring? #239

Open OriYitzhaki opened 4 years ago

OriYitzhaki commented 4 years ago

Hi, I'm trying to figure out the role of the latent space when inferring. As seen by glow.py the Z' variable is: audio = torch.cuda.FloatTensor(...).normal_() audio = torch.autograd.Variable(sigma*audio) and then the audio variable is getting updated and getting longer (early output etc.).

The thing which I'm troubling with is the audio which is generated is dependent on the spectrogram and a random variable. so, if it's random then we expect different result for each infer attempt. but clearly we don't want different results. so why don't just we use a constant variable instead of a random one?

Thanks in advance :)