KdaiP / StableTTS

Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
MIT License
363 stars 43 forks source link

Perform comparision #19

Open blackbird-fish opened 2 months ago

blackbird-fish commented 2 months ago

Hello, thanks for your great work, I wanna to know whether dit estimator performs better than mattcha's?

KdaiP commented 2 months ago

Hi, thanks for your interest and support!

I haven't conducted objective comparisons between Dit and Matcha's models. However, from my subjective experience, Dit doesn't perform as well as Matcha's transformer unet when using the same parameter size as Matcha TTS.

However, with a larger model size, like in HierSpeech++, Dit can achieve better results.