daniilrobnikov / vits2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
https://vits-2.github.io/demo/
MIT License
494 stars 55 forks source link

normalize_flow 和官方VITS代码不一样的实现方式 #6

Open nzpeng opened 1 year ago

nzpeng commented 1 year ago

Hi, I found the implementation of ResidualCouplingLayer.forward(normalize_flow.py) is different from official VITS code, and this section is not described in the VITS2 paper. what principle your implementation is based on? What are the advantages of this implementation? It seems more reasonable.

nzpeng commented 1 year ago

Additionally, there are two kl_loss in train.py, loss_kl_dur = kl_loss(z_q_dur, logs_q_dur, m_p_dur, logs_p_dur, z_mask) hps.train.c_kl_dur loss_kl_audio = kl_loss_normal(m_p_audio, logs_p_audio, m_q_audio, logs_q_audio, z_mask) hps.train.c_kl_audio How to understand it? What principle it is based on?

LEECHOONGHO commented 8 months ago

@nzpeng flow module with two kl_losses is bidirectional prior/posterior module proposed in Naturalspeech[1]. And in my experience, It seems to be superior to original vits`s flow module in terms of speaker similarity and training speed.

[1] NaturalSpeech https://arxiv.org/pdf/2205.04421.pdf