MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
401 stars 64 forks source link

MS-Vits is ill performanced #24

Open JohnHerry opened 10 months ago

JohnHerry commented 10 months ago

According to the paper, the MS-VITS is better then MB-VITS in both speed and naturalness, But in my instance, the MS-Vits got worse result and prone to pronunciation problems.