Open Liujingxiu23 opened 8 months ago
Yes, the two individual works came out almost the same time (in a week).
Although it seems similar, there are indeed some differences.
So, although both works uses flow matching for TTS, our focus is not the same. And it is not saying that one model is ideally better than the other. Actually I highly appreciate how Matcha-TTS is nicely and neatly open-sourced.
If one has to compare, you may use the two repos the train on exactly the same data. Personally I haven't done strict sample-to-sample comparison, but I remember that Matcha-TTS's vector field estimator architecture did achieve a decent performance in our code as well. The inference speed also depends on the architecture, too.
If anybody has done experiments about these, results are welcomed!
Thank you for your work and sharing! It seems VoiceFLow-TTS and MATCHA-TTS(https://github.com/shivammehta25/Matcha-TTS/) are very similar? What is the main diffences between two methods? And How about the performace on voice quality, for example prosody, and the inference speed? Best