TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 814 forks source link

🇨🇳 Chinese TTS now available 😘 #201

Closed dathudeptrai closed 3 years ago

dathudeptrai commented 4 years ago

Chinese TTS now available, thank @azraelkuan for his support :D. The model used Baker dataset here (https://www.data-baker.com/open_source.htmlt). The pretrained model licensed under CC BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) since the dataset is non-commercial :D

Pls check out the colab bellow and enjoy :D.

https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing

Note: this is just init results, there are more things can be done to make the model better.

cc: @candlewill @l4zyf9x @machineko

wyp1996 commented 4 years ago

Hello. Thanks for your great work! I'm new to the TTS area and this notebook could be a good start. However, I gave it a try and found out the Chinese model at present doesn't make pauses. I wonder has this been one of your potential improvements yet?

dathudeptrai commented 4 years ago

cc: @azraelkuan (person in charge)

azraelkuan commented 4 years ago

@wyp1996 for now, we do not have a frontend model, but we have place #1,2,3,sil in the training

lucasjinreal commented 4 years ago

@dathudeptrai Does it already in master branch for support?

dathudeptrai commented 4 years ago

@jinfagang everything is on master branch. (updated content :D.)

lucasjinreal commented 4 years ago

@dathudeptrai Any readme on how to train on Biaobei data?

azraelkuan commented 4 years ago

@jinfagang jsut download biaobei data and extract it to baker

tensorflow-tts-preprocess --dataset baker --rootdir ~/Data/baker --outdir dump --config ./preprocess/baker_preprocess.yaml

and train it using baker's yaml.

IreneZhou2018 commented 4 years ago

@azraelkuan As I know the sampling rate of the audio in the Biaobei dataset is 48k, but in the baker_preprocess.yaml the sampling rate is set as 24k. I didn't try the preprocess. Is that a mistake or I misunderstand the code?

dathudeptrai commented 4 years ago

@IreneZhou2018 the sampling rate in config is target sampling rate, if the dataset's sample rate is 48k so we re-sample it (see code here https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L194-L196)

IreneZhou2018 commented 4 years ago

@dathudeptrai ok, thanks for your reply and the work is amazing!

MachineLP commented 4 years ago

TensorFlowTTS Serving: https://github.com/MachineLP/QDServing/tree/master/model_serving https://github.com/MachineLP/QDServing

MachineLP commented 4 years ago

TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data

wyp1996 commented 4 years ago

TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data

Hi, do you have a more specific Readme? It seems promising and I'd like to have to try :)

Hongpeng1992 commented 4 years ago

@jinfagang jsut download biaobei data and extract it to baker

tensorflow-tts-preprocess --dataset baker --rootdir ~/Data/baker --outdir dump --config ./preprocess/baker_preprocess.yaml

and train it using baker's yaml.

Hongpeng1992 commented 4 years ago

it seems that fastspeech2 model do not work properly when sentence is long ? like 君不见 黄河之水天上来 奔流到海不复回 君不见 高堂明镜悲白发 朝如青丝暮成雪 人生得意须尽欢 莫使金樽空对月

dathudeptrai commented 4 years ago

it seems that fastspeech2 model do not work properly when sentence is long ? like 君不见 黄河之水天上来 奔流到海不复回 君不见 高堂明镜悲白发 朝如青丝暮成雪 人生得意须尽欢 莫使金樽空对月

https://github.com/TensorSpeech/TensorFlowTTS/issues/208#issuecomment-688356211

Hongpeng1992 commented 4 years ago

Thank you . I am still evaluating the model .

MachineLP commented 4 years ago

Chinese TTS欢迎加微信:lp9628,进入微信群讨论训练测试细节问题。

lucasjinreal commented 4 years ago

@dathudeptrai I try to train get this error:

2020-10-16 22:28:20.499294: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 476, in main
    trainer.fit(
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
    self.run()
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
    self._train_step(batch)
  File "examples/tacotron2/train_tacotron2.py", line 109, in _train_step
    self.one_step_forward(batch)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:    Trying to access element 62 in a list with 62 elements.
     [[{{node while_19/body/_1/while/TensorArrayV2Read_1/TensorListGetItem}}]]
     [[tacotron2/encoder/bilstm/forward_lstm/PartitionedCall]] [Op:__inference__one_step_forward_23575]

Function call stack:
_one_step_forward -> _one_step_forward -> _one_step_forward

My command:

python examples/tacotron2/train_tacotron2.py \                                                                                                               ⎇  master ✘ !?|73cac7f
  --train-dir ./dump/train/ \
  --dev-dir ./dump/valid/ \
  --outdir ./examples/tacotron2/exp/train.tacotron2.baker.v1/ \
  --config ./examples/tacotron2/conf/tacotron2.baker.v1.yaml \
  --use-norm 1 \
  --mixed_precision 0 \
  --resume ""
leijue222 commented 4 years ago
  1. The punctuation pause does not seem to be handled.
  2. Arabic numbers cannot be directly predicted.
  3. Hope this can be mixed in Chinese and English.
jucaowei commented 4 years ago

Chinese TTS now available, thank @azraelkuan for his support :D. The model used Baker dataset here (https://www.data-baker.com/open_source.htmlt). The pretrained model licensed under CC BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) since the dataset is non-commercial :D

Pls check out the colab bellow and enjoy :D.

https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing

Note: this is just init results, there are more things can be done to make the model better.

cc: @candlewill @l4zyf9x @machineko

@jinfagang everything is on master branch. (updated content :D.)

hello, the link to baker dataset was expired , and the officical website show that have no right to access the dataset , i hate to say that ,but can you provide annother way to get the dataset?

leijue222 commented 4 years ago

@jucaowei The link is here. The data only has a female voice.

jucaowei commented 4 years ago

@jucaowei The link is here. The data only has a female voice.

thank you !!

jucaowei commented 4 years ago

@jucaowei The link is here. The data only has a female voice.

404 error ,you can acess the website ? i got 404 not found HTTP Status 404 – Not Found Type Status Report Message /open_source.htmlt Description The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.

leijue222 commented 4 years ago

@jucaowei I can access the website normally. Which country are you in now? Maybe a VPN is needed for network problems?

jucaowei commented 4 years ago

@jucaowei I can access the website normally. Which country are you in now? Maybe a VPN is needed for network problems?

i already use VPN with HK node,not working, but my friend access the website right now, really appreciate for you reply

leijue222 commented 4 years ago

@azraelkuan Hi! Thanks for your work. Compared with some other reproduction projects, your reproduced tacotron2 can synthesize very long sentences without stress or omission. I have tried your job to achieve a maximum of about 90 seconds. To be reasonable, the Biaobei dataset is relatively short sentences, and the model trained with Biaobei should not be able to synthesize such long sentences. Have you done any special treatment of long sentences?

azraelkuan commented 4 years ago

no, this tacotron is implemented by the author of this project, i just use cn dataset to train it. may be i use the phoneme as the input, and each phoneme has a token #0 after it.

yuze notifications@github.com 于2020年11月7日周六 下午2:32写道:

@azraelkuan https://github.com/azraelkuan Hi! Thanks for your work. Compared with some other reproduction projects, your reproduced tacotron2 can synthesize very long sentences without stress or omission. Have you done any special treatment of long sentences?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowTTS/issues/201#issuecomment-723400006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSMBODUYYMWSRSPLUYJQ4LSOTSZJANCNFSM4P3B4TPA .

-- Kuan Chen (陈宽) Speech Lab, Shanghai Jiao Tong University Tel: +86 17621207116

c1a1o1 commented 3 years ago

用声码器模型把mel频谱转为语音信号需要注意什么? 我的采样率是22050 hop size =100

xiaoyangnihao commented 3 years ago

TTS交流群,VX:WorldSeal,欢迎相互交流和讨论~

ronggong commented 3 years ago

@azraelkuan It seems there is a mismatch between the hopsize 300 in preprocess and in tacotron/fastspeech2 and melgan: 256

https://github.com/TensorSpeech/TensorFlowTTS/blob/master/preprocess/baker_preprocess.yaml

hop_size: 300            # Hop size. (fixed value, don't change)

https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/tacotron2/conf/tacotron2.baker.v1.yaml

hop_size: 256            # Hop size.

https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/fastspeech2/conf/fastspeech2.baker.v2.yaml

hop_size: 256            # Hop size.

https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/melgan/conf/melgan.v1.yaml

hop_size: 256            # Hop size.
ronggong commented 3 years ago

@azraelkuan May I know how many gpus and batch_size you have used to train the tacotron2/fastspeech2 and melgan model in the colab? https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing

o20021106 commented 3 years ago

Hi

I noticed there are no punctuation pauses. Is there a way to add pauses for punctuations?

ronggong commented 3 years ago

@o20021106 In the original baker transcription there is punctuation, but they have been removed from the training data. https://github.com/ronggong/TensorFlowTTS/blob/e3d2469f8e917a13f86eb3165739039bbb05b3ff/tensorflow_tts/processor/baker.py#L644-L647

So it's possible to add them back.

o20021106 commented 3 years ago

@ronggong Yes, thank you. I noticed that the processor removes punctuations and other non-verbal characters when they specify error='ignore'. So for anyone who wants to use the pre-trained model and still produce pauses with punctuations, ignore should be set to default and those non-verbal characters should be replaced by pauses in get_phoneme_from_char_and_pinyin

ronggong commented 3 years ago

@o20021106 I see what you mean. I guess you can add a lambda mapping function to convert the punctuations to pause https://pypinyin.readthedocs.io/zh_CN/master/usage.html#handle-no-pinyin

luan78zaoha commented 3 years ago

@o20021106 In fact, # 3 in the baker transcription rerepresents the end of utterance. Generally the punctuations having pause meaning in Chinese, like ",。;!", are replaced with # 3。So adding # 3 in chinese text could create pause prosody in audio, you can see my demo.

980202006 commented 3 years ago

hi, in the colab, the energy ratio dose not work. I modify it as 0.5 or 2, but the wav has the same volumen as before.

luan78zaoha commented 3 years ago

@980202006 Because the audio amplitude of the training data don't change significantly, the model can not capture the influence factor of energy. So the energy ratio doesn't seem to work.

hertz-pj commented 3 years ago

image is this a right loss curve? I think eval loss is Weird. Can you share your loss curve?

dathudeptrai commented 3 years ago

@PeijiYang ur loss is ok, it is just overfitting a bit :)) may cause by small data :D.

wangzhengqiang commented 3 years ago

TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data

为什么基于阿里云tts生成训练音频,不是阿里收费啊

KevinTao24 commented 3 years ago

@dathudeptrai I try to train get this error:

2020-10-16 22:28:20.499294: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
Traceback (most recent call last):
  File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
    main()
  File "examples/tacotron2/train_tacotron2.py", line 476, in main
    trainer.fit(
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
    self.run()
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run
    self._train_epoch()
  File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
    self._train_step(batch)
  File "examples/tacotron2/train_tacotron2.py", line 109, in _train_step
    self.one_step_forward(batch)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
    outputs = execute.execute(
  File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:    Trying to access element 62 in a list with 62 elements.
   [[{{node while_19/body/_1/while/TensorArrayV2Read_1/TensorListGetItem}}]]
   [[tacotron2/encoder/bilstm/forward_lstm/PartitionedCall]] [Op:__inference__one_step_forward_23575]

Function call stack:
_one_step_forward -> _one_step_forward -> _one_step_forward

My command:

python examples/tacotron2/train_tacotron2.py \                                                                                                               ⎇  master ✘ !?|73cac7f
  --train-dir ./dump/train/ \
  --dev-dir ./dump/valid/ \
  --outdir ./examples/tacotron2/exp/train.tacotron2.baker.v1/ \
  --config ./examples/tacotron2/conf/tacotron2.baker.v1.yaml \
  --use-norm 1 \
  --mixed_precision 0 \
  --resume ""

Hi, I got the same problem, have u solve it ? how to do it ? thank u so much

hertz-pj commented 3 years ago

@azraelkuan 请问sil和eos这两个token都放在末尾作为终止符,是不是重复了?

neso613 commented 3 years ago

@dathudeptrai @candlewill @l4zyf9x @machineko Fastspeech Chinese TTS is availavle in Pytorch for custom dataset training. How can that be converted into TFLite? Can I get some referece How to train Chinese Fastspeech TTS?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.