[Bug] Does not support mixed pronunciation of Chinese and English

coderczp commented 1 month ago

Describe the bug

Hello everyone, thank you very much for open sourcing coqui ai, such a great TTS that performs much better than similar products in terms of performance and effectiveness. I encountered a problem when using coqui. When I input the following text, coqui only pronounces Chinese characters and ignores English words or letters inside. How can I make coqui support mixed Chinese and English pronunciation? Looking forward to your reply. Thank you again.

"欢迎来到某某公司，for english，please select 1" ===> The English in the text was ignored，no error found

To Reproduce

python3 TTS/server/server.py --model_name tts_models/zh-CN/baker/tacotron2-DDC-GST

Expected behavior

No response

Logs

INFO:werkzeug - - [14/Jul/2024 13:22:12] "GET /favicon.ico HTTP/1.1" 404 -
 > Model input: 欢迎来到某某公司，for english，please select 1。
 > Speaker Idx:
 > Language Idx:
 > Text splitted to sentences.
['欢迎来到某某公司，for english，please select 1。']
xuan1ɨŋ2 lai2daʌ4 mou3mou3 goŋ1sɪ1 ， for   english ， please   select   i1 。
 [!] Character 'g' not found in the vocabulary. Discarding it.
 > Processing time: 2.1070477962493896
 > Real-time factor: 0.44043307207738364
INFO:werkzeug - - [14/Jul/2024 13:35:23] "GET /api/tts?text=欢迎来到某某公司，for%20english，please%20select%201。&speaker_id=&style_wav={"0":%200.1}&language_id= HTTP/1.1" 200 -

Environment

Here is the version information I used=>
 version: ghcr.io/coqui-ai/tts-cpu
 platfrom: linux
 model: tts_models--zh-CN--baker--tacotron2-DDC-GST

Additional context

No response

Airgods commented 1 month ago

是的，我发现使用V2模型可以解决中英混合的问题，但是单个英文字母无法读出，它会误解为中的汉语拼音

eginhard commented 1 month ago

python3 TTS/server/server.py --model_name tts_models/zh-CN/baker/tacotron2-DDC-GST

You're using a model that only supports Chinese. You'd have to synthesise the English parts with a different model.

Airgods commented 1 month ago

python3 TTS/server/server.py --model_name tts_models/zh-CN/baker/tacotron2-DDC-GST

You're using a model that only supports Chinese. You'd have to synthesise the English parts with a different model.

请问我该如何做呢

Airgods commented 1 month ago

python3 TTS/server/server.py --model_name tts_models/zh-CN/baker/tacotron2-DDC-GST

You're using a model that only supports Chinese. You'd have to synthesise the English parts with a different model.

请问我该如何做呢

我看了很多，并没有找到如何选择这种模式，我选择v2的模型，设置在中文下，可以说出英文，但是针对单个英文字母，比如 a b c d，模型无法识别

ksyyk commented 1 month ago

自己做分词，区别出中英文文本的位置，分别调用对应的模型。

Airgods commented 1 month ago

自己做分词，区别出中英文文本的位置，分别调用对应的模型。

好的，谢谢，我尝试换模型了，这样感觉有些麻烦

coderczp commented 1 month ago

python3 TTS/server/server.py --model_name tts_models/zh-CN/baker/tacotron2-DDC-GST

You're using a model that only supports Chinese. You'd have to synthesise the English parts with a different model.

Okay, thank you for your reply

coderczp commented 1 month ago

自己做分词，区别出中英文文本的位置，分别调用对应的模型。

好的，谢谢，我尝试换模型了，这样感觉有些麻烦

你换的哪个模型？问题解决了吗？

Airgods commented 1 month ago

自己做分词，区别出中英文文本的位置，分别调用对应的模型。

好的，谢谢，我尝试换模型了，这样感觉有些麻烦

你换的哪个模型？问题解决了吗？

解决了，最近阿里开源了一个cosy voice

coqui-ai / TTS