could you explain about training process?

thanks for awesome work! since i can not understand chinese, i translated readme to english i understood traning process as below

it seems there's two stage training process, training is quite complicated, especially for stage 2 training

For first stage, train VITS(SynthesizerTrn) with whisper ppg, NSF-hifigan, external speaker encoder(d-vector)

Second stage(SynthsizerTrnEx), apply GRL, SNAC for preventing speaker information leakage in text encoder, also apply natural speech loss(bidirectional loss between prior and posterior)

is it right? also, i can not find SynthesizerTrnEx's usage in this code base(maybe currently). could you explain bit more about training process?

PlayVoice / whisper-vits-svc

could you explain about training process? #21