Closed dathudeptrai closed 3 years ago
Hello. Thanks for your great work! I'm new to the TTS area and this notebook could be a good start. However, I gave it a try and found out the Chinese model at present doesn't make pauses. I wonder has this been one of your potential improvements yet?
cc: @azraelkuan (person in charge)
@wyp1996 for now, we do not have a frontend model, but we have place #1,2,3,sil in the training
@dathudeptrai Does it already in master branch for support?
@jinfagang everything is on master branch. (updated content :D.)
@dathudeptrai Any readme on how to train on Biaobei data?
@jinfagang jsut download biaobei data and extract it to baker
tensorflow-tts-preprocess --dataset baker --rootdir ~/Data/baker --outdir dump --config ./preprocess/baker_preprocess.yaml
and train it using baker's yaml.
@azraelkuan As I know the sampling rate of the audio in the Biaobei dataset is 48k, but in the baker_preprocess.yaml the sampling rate is set as 24k. I didn't try the preprocess. Is that a mistake or I misunderstand the code?
@IreneZhou2018 the sampling rate in config is target sampling rate, if the dataset's sample rate is 48k so we re-sample it (see code here https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/bin/preprocess.py#L194-L196)
@dathudeptrai ok, thanks for your reply and the work is amazing!
TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data
TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data
Hi, do you have a more specific Readme? It seems promising and I'd like to have to try :)
@jinfagang jsut download biaobei data and extract it to
baker
tensorflow-tts-preprocess --dataset baker --rootdir ~/Data/baker --outdir dump --config ./preprocess/baker_preprocess.yaml
and train it using baker's yaml.
it seems that fastspeech2 model do not work properly when sentence is long ? like 君不见 黄河之水天上来 奔流到海不复回 君不见 高堂明镜悲白发 朝如青丝暮成雪 人生得意须尽欢 莫使金樽空对月
it seems that fastspeech2 model do not work properly when sentence is long ? like 君不见 黄河之水天上来 奔流到海不复回 君不见 高堂明镜悲白发 朝如青丝暮成雪 人生得意须尽欢 莫使金樽空对月
https://github.com/TensorSpeech/TensorFlowTTS/issues/208#issuecomment-688356211
Thank you . I am still evaluating the model .
Chinese TTS欢迎加微信:lp9628,进入微信群讨论训练测试细节问题。
@dathudeptrai I try to train get this error:
2020-10-16 22:28:20.499294: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
Traceback (most recent call last):
File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
main()
File "examples/tacotron2/train_tacotron2.py", line 476, in main
trainer.fit(
File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
self.run()
File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch
self._train_step(batch)
File "examples/tacotron2/train_tacotron2.py", line 109, in _train_step
self.one_step_forward(batch)
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
result = self._call(*args, **kwds)
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
return self._stateless_fn(*args, **kwds)
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call
return self._call_flat(
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call
outputs = execute.execute(
File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access element 62 in a list with 62 elements.
[[{{node while_19/body/_1/while/TensorArrayV2Read_1/TensorListGetItem}}]]
[[tacotron2/encoder/bilstm/forward_lstm/PartitionedCall]] [Op:__inference__one_step_forward_23575]
Function call stack:
_one_step_forward -> _one_step_forward -> _one_step_forward
My command:
python examples/tacotron2/train_tacotron2.py \ ⎇ master ✘ !?|73cac7f
--train-dir ./dump/train/ \
--dev-dir ./dump/valid/ \
--outdir ./examples/tacotron2/exp/train.tacotron2.baker.v1/ \
--config ./examples/tacotron2/conf/tacotron2.baker.v1.yaml \
--use-norm 1 \
--mixed_precision 0 \
--resume ""
Chinese TTS now available, thank @azraelkuan for his support :D. The model used Baker dataset here (https://www.data-baker.com/open_source.htmlt). The pretrained model licensed under CC BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) since the dataset is non-commercial :D
Pls check out the colab bellow and enjoy :D.
https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing
Note: this is just init results, there are more things can be done to make the model better.
cc: @candlewill @l4zyf9x @machineko
@jinfagang everything is on master branch. (updated content :D.)
hello, the link to baker dataset was expired , and the officical website show that have no right to access the dataset , i hate to say that ,but can you provide annother way to get the dataset?
@jucaowei The link is here. The data only has a female voice.
thank you !!
@jucaowei The link is here. The data only has a female voice.
404 error ,you can acess the website ? i got 404 not found HTTP Status 404 – Not Found Type Status Report Message /open_source.htmlt Description The origin server did not find a current representation for the target resource or is not willing to disclose that one exists.
@jucaowei I can access the website normally. Which country are you in now? Maybe a VPN is needed for network problems?
@jucaowei I can access the website normally. Which country are you in now? Maybe a VPN is needed for network problems?
i already use VPN with HK node,not working, but my friend access the website right now, really appreciate for you reply
@azraelkuan Hi! Thanks for your work. Compared with some other reproduction projects, your reproduced tacotron2 can synthesize very long sentences without stress or omission. I have tried your job to achieve a maximum of about 90 seconds. To be reasonable, the Biaobei dataset is relatively short sentences, and the model trained with Biaobei should not be able to synthesize such long sentences. Have you done any special treatment of long sentences?
no, this tacotron is implemented by the author of this project, i just use cn dataset to train it. may be i use the phoneme as the input, and each phoneme has a token #0 after it.
yuze notifications@github.com 于2020年11月7日周六 下午2:32写道:
@azraelkuan https://github.com/azraelkuan Hi! Thanks for your work. Compared with some other reproduction projects, your reproduced tacotron2 can synthesize very long sentences without stress or omission. Have you done any special treatment of long sentences?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowTTS/issues/201#issuecomment-723400006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSMBODUYYMWSRSPLUYJQ4LSOTSZJANCNFSM4P3B4TPA .
-- Kuan Chen (陈宽) Speech Lab, Shanghai Jiao Tong University Tel: +86 17621207116
用声码器模型把mel频谱转为语音信号需要注意什么? 我的采样率是22050 hop size =100
TTS交流群,VX:WorldSeal,欢迎相互交流和讨论~
@azraelkuan It seems there is a mismatch between the hopsize 300 in preprocess and in tacotron/fastspeech2 and melgan: 256
https://github.com/TensorSpeech/TensorFlowTTS/blob/master/preprocess/baker_preprocess.yaml
hop_size: 300 # Hop size. (fixed value, don't change)
hop_size: 256 # Hop size.
hop_size: 256 # Hop size.
https://github.com/TensorSpeech/TensorFlowTTS/blob/master/examples/melgan/conf/melgan.v1.yaml
hop_size: 256 # Hop size.
@azraelkuan May I know how many gpus and batch_size you have used to train the tacotron2/fastspeech2 and melgan model in the colab? https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing
Hi
I noticed there are no punctuation pauses. Is there a way to add pauses for punctuations?
@o20021106 In the original baker transcription there is punctuation, but they have been removed from the training data. https://github.com/ronggong/TensorFlowTTS/blob/e3d2469f8e917a13f86eb3165739039bbb05b3ff/tensorflow_tts/processor/baker.py#L644-L647
So it's possible to add them back.
@ronggong Yes, thank you. I noticed that the processor removes punctuations and other non-verbal characters when they specify error='ignore'
. So for anyone who wants to use the pre-trained model and still produce pauses with punctuations, ignore
should be set to default
and those non-verbal characters should be replaced by pauses in get_phoneme_from_char_and_pinyin
@o20021106 I see what you mean. I guess you can add a lambda mapping function to convert the punctuations to pause https://pypinyin.readthedocs.io/zh_CN/master/usage.html#handle-no-pinyin
@o20021106 In fact, # 3 in the baker transcription rerepresents the end of utterance. Generally the punctuations having pause meaning in Chinese, like ",。;!", are replaced with # 3。So adding # 3 in chinese text could create pause prosody in audio, you can see my demo.
hi, in the colab, the energy ratio dose not work. I modify it as 0.5 or 2, but the wav has the same volumen as before.
@980202006 Because the audio amplitude of the training data don't change significantly, the model can not capture the influence factor of energy. So the energy ratio doesn't seem to work.
is this a right loss curve? I think eval loss is Weird. Can you share your loss curve?
@PeijiYang ur loss is ok, it is just overfitting a bit :)) may cause by small data :D.
TensorflowTTS训练数据生成:拉取文本数据、将文本专为拼音、基于阿里云TTS生成TensorflowTTS训练音频、训练前的preprocess/normalize:https://github.com/MachineLP/TensorFlowTTS_chinese/tree/master/generate_tts_data
为什么基于阿里云tts生成训练音频,不是阿里收费啊
@dathudeptrai I try to train get this error:
2020-10-16 22:28:20.499294: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8 Traceback (most recent call last): File "examples/tacotron2/train_tacotron2.py", line 488, in <module> main() File "examples/tacotron2/train_tacotron2.py", line 476, in main trainer.fit( File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 870, in fit self.run() File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 101, in run self._train_epoch() File "/media/jintian/samsung/source/ai/swarm/exp/TensorFlowTTS/tensorflow_tts/trainers/base_trainer.py", line 123, in _train_epoch self._train_step(batch) File "examples/tacotron2/train_tacotron2.py", line 109, in _train_step self.one_step_forward(batch) File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__ result = self._call(*args, **kwds) File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call return self._stateless_fn(*args, **kwds) File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__ return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1843, in _filtered_call return self._call_flat( File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 1923, in _call_flat return self._build_call_outputs(self._inference_function.call( File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 545, in call outputs = execute.execute( File "/home/jintian/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Trying to access element 62 in a list with 62 elements. [[{{node while_19/body/_1/while/TensorArrayV2Read_1/TensorListGetItem}}]] [[tacotron2/encoder/bilstm/forward_lstm/PartitionedCall]] [Op:__inference__one_step_forward_23575] Function call stack: _one_step_forward -> _one_step_forward -> _one_step_forward
My command:
python examples/tacotron2/train_tacotron2.py \ ⎇ master ✘ !?|73cac7f --train-dir ./dump/train/ \ --dev-dir ./dump/valid/ \ --outdir ./examples/tacotron2/exp/train.tacotron2.baker.v1/ \ --config ./examples/tacotron2/conf/tacotron2.baker.v1.yaml \ --use-norm 1 \ --mixed_precision 0 \ --resume ""
Hi, I got the same problem, have u solve it ? how to do it ? thank u so much
@azraelkuan 请问sil和eos这两个token都放在末尾作为终止符,是不是重复了?
@dathudeptrai @candlewill @l4zyf9x @machineko Fastspeech Chinese TTS is availavle in Pytorch for custom dataset training. How can that be converted into TFLite? Can I get some referece How to train Chinese Fastspeech TTS?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Chinese TTS now available, thank @azraelkuan for his support :D. The model used Baker dataset here (https://www.data-baker.com/open_source.htmlt). The pretrained model licensed under CC BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) since the dataset is non-commercial :D
Pls check out the colab bellow and enjoy :D.
https://colab.research.google.com/drive/1YpSHRBRPBI7cnTkQn1UcVTWEQVbsUm1S?usp=sharing
Note: this is just init results, there are more things can be done to make the model better.
cc: @candlewill @l4zyf9x @machineko