Open Arafat4341 opened 4 years ago
You can use 仮名 or ローマ字 as symbols.
@JasonWei512 Thanks for responding! Where should I use the symbols? Have you trained with JSUT data?
You need to modify _characters
in tacotron/utils/symbols.py
for non-English language.
Then modify build_from_path()
in datasets/preprocessor.py
according to JSUT dataset's format.
I haven't tried JSUT dataset, since I don't speak Japanese...
There are thousands of kanjis + hiraganas katakanas! How am I supposed to give'em all in _characters! By the way I have a pre-processed jsut data with train.txt and mel and spec files! But it seems it doesn't fit well in this model!
I am not sure about this, but I think you can put only hiraganas and katakanas in _characters, and replace kanjis in JSUT dataset's transcription file with hiraganas using some tools. (私は日本人です → わたしはにほんじんです)
@JasonWei512 Thanks a lot! Your suggestion helped! Can you tell me whether I can save checkpoint at keyboard interruption? In order to do that where should I modify?
@Arafat4341 Converting all Kanjis into Kana (either all katakana or all hiragana) will help boost the performance as @JasonWei512 said. You can use Pykakasi like tools for this.
Can we train this implementation of tacotron-2 on jsut data?