daniilrobnikov / vits2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
https://vits-2.github.io/demo/
MIT License
503 stars 53 forks source link

Should z_q_dur drawn from Gaussian distribution? #10

Open trinh-hoang-hiep opened 11 months ago

trinh-hoang-hiep commented 11 months ago

I have a concern about whether the random variable z_q_dur should follow a normal distribution or not. When assuming z_q_dur follows a normal distribution, of course z_audio will follow a normal distribution, but with normal flow when z_q_dur is nonGauss, z_audio can still be Gauss. So maybe the first KL function should be a little more flexible?

image