TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 815 forks source link

MFA on LJSpeech #448

Closed CrossEntropy closed 3 years ago

CrossEntropy commented 3 years ago

Hi @machineko @dathudeptrai @ZDisket I have two questions.

Q1: I cleaned the text in ljspeech using english_cleaners, but when I run:

python examples/mfa_extraction/run_mfa.py \
--corpus_directory ./ljspeech \
--output_directory ./mfa/parsed \
--jobs 8

the result is very poor: Setting up corpus information... Number of speakers in corpus: 1, average number of utterances per speaker: 12828.0 Creating dictionary information... Done with setup. There were 3275 segments/files not aligned. Please see ./mfa/parsed/unaligned.txt for more details on why alignment failed for these files. Done! Everything took 2419.595493555069 seconds

Did I do something wrong?

Q2: When I run this for ljspeech, do i need to change the --yaml_path ?

python examples/mfa_extraction/txt_grid_parser.py \
--yaml_path examples/fastspeech2_libritts/conf/fastspeech2libritts.yaml \
--dataset_path ./ljspeech \
--text_grid_path ./mfa/parsed \
--output_durations_path ./ljspeech/durations \
--sample_rate 16000

Thanks!

machineko commented 3 years ago

Hey @CrossEntropy MFA is package build not by us but not aligned can happen by multiple reasons mostly failing of beam search and not enough phonemes in dictionary.

Questions related to MFA u should make here => MFA_REPO

About second question => Yes u should point to config based on ljspeech not libritts this are 2 different dataset (single and multispeakers)

CrossEntropy commented 3 years ago

Thank you for your reply @machineko One step further, can I train a tacotron2 that uses phonemes to get better alignment?

machineko commented 3 years ago

Yup u can very easy train Tacotron2 using phonemes just swap characters to phonemes in ur input files :). but still MFA give better alignment in most cases :)

CrossEntropy commented 3 years ago

Thanks @machineko