question about demo wavs

anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor

MIT License

274 stars 34 forks source link

I found idea for this paper very fascinating, and demo wav for PITS is really great. I'm really looking forward to apply it on my own dataset.

However, the paper shows that the MOS score gap between VITS and PITS is greater than the gap between VITS and FS2. This conclusion is exactly the opposite of what I heard. I think the audio quality gap between VITS and PITS does not seem to be as big as the gap between VITS and FS2.

Not sure if it has to do with my personal preference, does anyone else feel similarly to me?

Also, you've mentioned that there are still some mistakes in preprint. Can you briefly tell us what will be changed in future versions?

Thanks!

anonymous-pits / pits

question about demo wavs #16