Open nzpeng opened 1 year ago
Additionally, there are two kl_loss in train.py, loss_kl_dur = kl_loss(z_q_dur, logs_q_dur, m_p_dur, logs_p_dur, z_mask) hps.train.c_kl_dur loss_kl_audio = kl_loss_normal(m_p_audio, logs_p_audio, m_q_audio, logs_q_audio, z_mask) hps.train.c_kl_audio How to understand it? What principle it is based on?
@nzpeng flow module with two kl_losses is bidirectional prior/posterior module proposed in Naturalspeech[1]. And in my experience, It seems to be superior to original vits`s flow module in terms of speaker similarity and training speed.
[1] NaturalSpeech https://arxiv.org/pdf/2205.04421.pdf
Hi, I found the implementation of ResidualCouplingLayer.forward(normalize_flow.py) is different from official VITS code, and this section is not described in the VITS2 paper. what principle your implementation is based on? What are the advantages of this implementation? It seems more reasonable.