anonymous-pits / pits

PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor
MIT License
274 stars 34 forks source link

question about demo wavs #16

Open TinaChen95 opened 1 year ago

TinaChen95 commented 1 year ago

I found idea for this paper very fascinating, and demo wav for PITS is really great. I'm really looking forward to apply it on my own dataset.

However, the paper shows that the MOS score gap between VITS and PITS is greater than the gap between VITS and FS2. This conclusion is exactly the opposite of what I heard. I think the audio quality gap between VITS and PITS does not seem to be as big as the gap between VITS and FS2.

Not sure if it has to do with my personal preference, does anyone else feel similarly to me?

Also, you've mentioned that there are still some mistakes in preprint. Can you briefly tell us what will be changed in future versions?


anonymous-pits commented 1 year ago

While MOS is a subjective metrics and has variance with in samples and subjects, its absolute value of difference does not have substantial meaning. In addition, while confidential interval of VITS and FS2 are overlapped, it stands for they have statistically no meaningful difference. However, PITS (A+D) and VITS/FS2 have no overlap on CI, and it stands for they have statistically meaningful differences.

In addition, preprint will be update in a week, typos and misreadable parts are modified, and VC appendix is added.