fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

声音有点抖,有点沙哑 #170

Open freecui opened 4 years ago

freecui commented 4 years ago

请听一下我的这个结果,听着某些词或者字有点抖有点沙哑,特别是抖,不知道原因是什么? 1350.zip

mindmapper15 commented 4 years ago

(I translated your question with google translate.)

Set the parameter "voc_gen_batched" to False in your hparams.py Although batched WaveRNN is much faster than original WaveRNN, it is a trade-off feature. The number of batch size increases (the number of sample in each batch entry decreases), audio generation speed will be faster but the quality of generated sounds worse.

If you disable batched generation feature, the speed of audio generation will be very slow but it will ultimately generate finest results.

freecui commented 4 years ago

Thank you very much, I used to set voc_gen_batched = True , I will train that again setting voc_gen_batched = False

mindmapper15 commented 4 years ago

You don't need to re-train your vocoder. voc_gen_batched is for inference only.

freecui commented 4 years ago

The audio voice is better when I set set voc_gen_batched = False for inference, but the consumption time increased from 33.42 seconds to 170 seconds on this utterance ; I want to do real time TTS, can you give me some advice?

mindmapper15 commented 4 years ago

@freecui I implemented my own batched mode WaveRNN which is generating "unbatched(which means a single audio clip wasn't separated to multiple segments) multiple audio" at once.

It's still slower than original batched mode and consumes tones of VRAM but way faster than generating audio one by one with unbatched mode.

Maybe you should try that way.

I was focusing more on TTS not WaveRNN so I still don't know how to generate the finest result with batched single audio mode.

As I said, batched WaveRNN inference is trade-off feature. If you want the finest result and faster generation, you'd better implement the feature that generates multiple unbatched audio at once.

If you are focusing more generation time than quality, find the proper hp.voc_target and hp.voc_overlap value that satisfies both generation time and quality.

zhangzhenyuyu commented 4 years ago

您好!我想请问一下您几个问题,您是自己训练的中文的合成吗?训练数据是哪里来的呢?这个模型支持中文的吗?期待您的回复!

OswaldoBornemann commented 4 years ago

@freecui would you share your config file? Thanks a lot.

OswaldoBornemann commented 4 years ago

@freecui Would you please share your wavernn training loss ?

freecui commented 4 years ago

@zhangzhenyuyu ,训练数据是内部数据;支持中文模型的

freecui commented 4 years ago

@tsungruihon ,we can use default parameters;

OswaldoBornemann commented 4 years ago

@freecui Glad to hear that. That's a really amazing result. Would you mind sharing your wechat so that we could communicate ? I also focus on Chinese TTS and ASR. My email is petertsengruihon@gmail.com

OswaldoBornemann commented 4 years ago

@freecui By the way, may i ask how many epoch or steps have you trained ?

justln1113 commented 4 years ago

@freecui 請問一下,要訓練中文語音的話,需要對hparams或其他檔案做更改嗎? 我有在hparams.py裡面看到tts_cleaner_names = ['english_cleaners'],不知道是否要改成中文

freecui commented 4 years ago

@justln1113,这个要更改的,basic_clearners

justln1113 commented 4 years ago

@freecui 好的,感謝答覆,還有甚麼需要注意的地方嗎?

OswaldoBornemann commented 4 years ago

@freecui 你好 想问下一下你训练的Loss和Steps是在哪一个位置呢谢谢

zhangzhenyuyu commented 4 years ago

@freecui 非常抱歉还想打扰您一下,我使用ts_cleaner_names = ['basic_clearners']遇到了错误,使用它得到的输入x都是空的,我在想是不是应该用transliteration_cleaners。 期待您的回复,谢谢!

SnowInHokkaido commented 4 years ago

@freecui 非常抱歉还想打扰您一下,我使用ts_cleaner_names = ['basic_clearners']遇到了错误,使用它得到的输入x都是空的,我在想是不是应该用transliteration_cleaners。 期待您的回复,谢谢!

For Chinese, basic cleaners can work only if your input is pinyin or phoneme character.

jerryname2022 commented 4 years ago

@freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想

SnowInHokkaido commented 4 years ago

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率

xiaomingzhong notifications@github.com 于2020年4月21日周二 下午3:00写道:

@freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fatchord/WaveRNN/issues/170#issuecomment-616992837, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

jerryname2022 commented 4 years ago

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率 xiaomingzhong notifications@github.com 于2020年4月21日周二 下午3:00写道: @freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的,不知道是不是和这个有关

jerryname2022 commented 4 years ago

就是拼音字符+声调,声音不理想,可以尝试再训练一定步数后降低学习率 xiaomingzhong notifications@github.com 于2020年4月21日周二 下午3:00写道: @freecui https://github.com/freecui 兄弟,你那中文生成语音是用拼音吗?拼音格式是怎么样的,我现在按照你的方法训练了900K步,用拼音生成声音时还是不理想 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#170 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG3GCMR6YLIOFYRFJI2I45LRNVABFANCNFSM4KXEXHSA .

我是用LJSpeech数据集训练的,不知道是不是和这个有关

我是想训练出正常一点的语音,现在感觉很机械,1楼那样的我感觉还可以就是我想要的结果

1zxLi commented 4 years ago

你好 你在ljspeech数据集下训练的结果怎么样? loss值大概是多少?我在训练500Ksteps后效果仍然很差。希望能够得到你的帮助。

xuexidi commented 4 years ago

我也遇到了楼上的问题,抽了5000条VCTK数据集的语音来从头训练WavRNN(MOL模式),Batch size=64,训练1了450k steps效果还是很糟糕,真心请教您一下,有什么需要注意的地方吗? Loss曲线: 31a6e8c899b5e9e5f479d0fc641843c

400K steps时候生成的语音: 2bccfba6a7efa4e1dc6874c85644243

zhaoyun630 commented 3 years ago

我也遇到了楼上的问题,抽了5000条VCTK数据集的语音来从头训练WavRNN(MOL模式),Batch size=64,训练1了450k steps效果还是很糟糕,真心请教您一下,有什么需要注意的地方吗? Loss曲线: 31a6e8c899b5e9e5f479d0fc641843c

400K steps时候生成的语音: 2bccfba6a7efa4e1dc6874c85644243

你后来解决了吗?我在aishell3上训练的,遇到了同样的问题。