MycroftAI / mimic2

Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Apache License 2.0
581 stars 103 forks source link

LJSpeech dataset visuals look very bad. Why? What to do? #46

Closed steve3nto closed 4 years ago

steve3nto commented 4 years ago

I have preprocessed the LJSpeech dataset with preprocess.py and used the supplied analyze.py script to generate visuals for that dataset. The graphs look very different from the example ones in the README.

Character lengths go up to more than 8000 and there seem to be many outliers. Plots are attached at the end of this post, so you can see what I mean.

Did you notice the same? Is it a bug in the LJSpeech dataset? Should I ingore this and go on training? Or should I find a way to preprocess the dataset better and exclude the outliers?

char_len_vs_avg_secs char_len_vs_num_samples char_len_vs_std phoneme_dist

el-tocino commented 4 years ago

Maybe bad files? I don't recall seeing files that large in LJ. Anything past 250 characters is probably over 10s in length and would cause issues.

steve3nto commented 4 years ago

yeah it was an error due to " characters in the text. The preprocess function was putting together phrases because it was trying to close the " going to the next line.

I solved it my removing all " characters from the text in the preprocess.py script. Now the dataset looks fine after running analyze.py