VOICEVOX / voicevox

無料で使える中品質なテキスト読み上げソフトウェア、VOICEVOXのエディター
https://voicevox.hiroshiba.jp/
Other
2.47k stars 298 forks source link

Can we support other languages? #2236

Closed ILG2021 closed 1 month ago

ILG2021 commented 1 month ago

It is a nice project, I have never see a model can generate both speech and songs. I have create some diffsinger voicebanks and familiar with openutau. Recently I want to create a tts voice model, so I found this project. I think your project has great potential because it needs much less dataset to create a model than traditional TTS model. The only drawback is the lack of multiple languages. Can you improve this? Can you gives a document for training a english voicebank of voicevox?

Hiroshiba commented 1 month ago

@ILG2021 Thank you for creating this issue.

Currently, VOICEVOX does not provide mechanisms for creating voicebanks, nor do we have a specific approach for supporting languages other than Japanese. Therefore, this is all the information we can provide, and we will be closing this issue.

The primary reason we haven't supported other languages is the need for phoneme labeling in each language, and we haven't conducted research in this area yet. If you are aware of any automatic phoneme alignment techniques for languages like English, we would be interested to learn more about them, as this could potentially be helpful for those creating voicebanks.

ILG2021 commented 1 month ago

Thanks for your reply. Most auto alignment for labeling is SOFA. https://github.com/qiuqiao/SOFA/discussions/categories/pretrained-model-sharing As I have tried, I found chinese auto labeling is most accurate, 90% accurate. Others are not that accurate, maybe related to the dataset's amount or the complexity of phoneme system.

ILG2021 commented 1 month ago

So we can not make voicebank like diffsinger in VOICEVOX ?

Hiroshiba commented 1 month ago

@ILG2021

Oh, thank you for the information about SOFA! Achieving 90% accuracy does seem like manual labeling is nearly essential, but it looks like it could greatly reduce the workload! We'll definitely consider this.

Regarding creating voicebanks in VOICEVOX, currently, we cannot support the creation of voicebanks, and it is not part of our immediate goals. There's a reason for this: our mission is to make speech synthesis software and characters more accessible. Here's more about our mission, values, and vision. Because of this, we prioritize improving quality, hosting events, and finding business partners.

However, I understand there is a demand for creating voicebanks. While I personally may not undertake this, if someone around me is interested, I might encourage them to explore this possibility.

Thank you for your interest!

ILG2021 commented 1 month ago

Hello, thanks for your reply. Do you mean that you don't open source the train code? We can only use the voicebank that you have trained and build in the software? Am I right? I don't mean that let you develop multiple language voicebank. I mean can we open source the code and document that our community can make they own multiple voicebank. As for me, i want to make a tts voicebank for English and some low resource languages.

Hiroshiba commented 1 month ago

Indeed, we have not open-sourced the method for creating voicebanks in VOICEVOX. The primary reason is that, as of now, there is no motivation for us to do so. Sorry for not meeting your expectations.

However, I would like to point out that the VOICEVOX engine API is open-source. This means that if you develop an API, you can use the "multi-engine" system of VOICEVOX to operate other engines. If you're interested in TTS for other languages, you might find this issue of interest as well. Please feel free to check it out: VOICEVOX engine issue #542