coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
34.52k stars 4.18k forks source link

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual for regex, replacement in _abbreviations[lang]: KeyError: 'zh-cn'[Bug] #3189

Closed lucasjinreal closed 9 months ago

lucasjinreal commented 11 months ago

Describe the bug

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual for regex, replacement in _abbreviations[lang]: KeyError: 'zh-cn'

To Reproduce

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual for regex, replacement in _abbreviations[lang]: KeyError: 'zh-cn'

Expected behavior

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual for regex, replacement in _abbreviations[lang]: KeyError: 'zh-cn'

Logs

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual
    for regex, replacement in _abbreviations[lang]:
KeyError: 'zh-cn'

Environment

TTS/TTS/tts/layers/xtts/tokenizer.py", line 180, in expand_abbreviations_multilingual
    for regex, replacement in _abbreviations[lang]:
KeyError: 'zh-cn'

Additional context

No response

douhaohaode commented 11 months ago

If zh-cn and zh represent Chinese, it is recommended to use one.

如果想运行可以手动先更改TTS文件下tokenizer.py中118行和283行 zh改为zh-cn

lucasjinreal commented 11 months ago

I think the tokenizer these map's keys should be consistent with language codes.

AIFSH commented 11 months ago

before offical fix

pip uninstall TTS
pip install TTS==0.20.2

work!

jbang2004 commented 11 months ago

If zh-cn and zh represent Chinese, it is recommended to use one.

如果想运行可以手动先更改TTS文件下tokenizer.py中118行和283行 zh改为zh-cn

可以啊兄弟,对了,兄弟知道怎么保存说话人的潜在特征和嵌入,使用这些特征生成多段对话吗?现在每次都要先生成特征,再推理,效率很低

lucasjinreal commented 11 months ago

@jbang2004 可以,但是官方似乎压根没有考虑这个问题

jbang2004 commented 11 months ago

@jbang2004 可以,但是官方似乎压根没有考虑这个问题

研究了一个上午,官方文档里有个直接从模型提取特征,然后用torchaudio生成wav的方法,这个可以一直沿用相同的特征进行转换,不过这种方法生成的效果比使用api差一些,不知道为什么

lucasjinreal commented 11 months ago

@jbang2004 方便分享一下代码吗

Edresson commented 11 months ago

I fixed it on https://github.com/coqui-ai/TTS/pull/3216. "zh-cn" is what we have in the config and docs so I rename "zh" to "zh-cn".

genglinxiao commented 11 months ago

I think there are 2 places that the key code used for Chinese language are inconsistent: The model uses "zh-cn" for the Chinese (simplifed) language. However, the key defined in the _abbreviations and the _symbols_multilingual for Chinese language is "zh". These 2 structures are used in expand_abbreviations_multilingual() and expand_symbols_multilingual() respectively, resulting in key errors.

In my case, I changed the key from "zh-cn" to "zh" inside these 2 functions by adding the following lines to the functions.

    if lang=="zh-cn":
        lang="zh"

But I think there ought to be a cleaner solution.

stale[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.