TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.82k stars 812 forks source link

How do I prepare a dataset in another language? #277

Closed Nistrian closed 4 years ago

Nistrian commented 4 years ago

I have a normalized dataset in Russian that follows the ljspeech format. How can I skip the text normalization step? When I run tensorflow-tts-preprocess --rootdir ./dataset --outdir ./dump --config preprocess/ljspeech_preprocess.yaml --dataset ljspeechfollowing the instructions, I get the error:

Traceback (most recent call last): File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in sys.exit(preprocess()) File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 371, in preprocess cleaner_names=dataset_cleaner[config["dataset"]], File "", line 14, in init File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/processor/base_processor.py", line 63, in __post_init__ self.create_items() File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/processor/ljspeech.py", line 148, in create_items self.items = [self.split_line(self.data_dir, line, "|") for line in f] File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/processor/ljspeech.py", line 148, in self.items = [self.split_line(self.data_dir, line, "|") for line in f] File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/processor/ljspeech.py", line 153, in split_line text_norm = parts[self.positions["text_norm"]] IndexError: list index out of range

dathudeptrai commented 4 years ago

@Nistrian you should write your own Processor. Pls refer all supported processor here (https://github.com/TensorSpeech/TensorFlowTTS/tree/master/tensorflow_tts/processor).