Closed steve3nto closed 4 years ago
Maybe bad files? I don't recall seeing files that large in LJ. Anything past 250 characters is probably over 10s in length and would cause issues.
yeah it was an error due to " characters in the text. The preprocess function was putting together phrases because it was trying to close the " going to the next line.
I solved it my removing all " characters from the text in the preprocess.py script. Now the dataset looks fine after running analyze.py
I have preprocessed the LJSpeech dataset with preprocess.py and used the supplied analyze.py script to generate visuals for that dataset. The graphs look very different from the example ones in the README.
Character lengths go up to more than 8000 and there seem to be many outliers. Plots are attached at the end of this post, so you can see what I mean.
Did you notice the same? Is it a bug in the LJSpeech dataset? Should I ingore this and go on training? Or should I find a way to preprocess the dataset better and exclude the outliers?