Open egorsmkv opened 3 months ago
2:
It was done faster than I expected! Here are the files flagged for each speaker: ukrainian-tts_filtered.json.zip
Here is the way this tool works (source code here):
len(text) / duration_sec
for each audio file (characters per second)Based on the tool, there were:
Should still be enough data to train with, but it might be good for humans to review those files :)
1:
Some audio files appear to have been cut off. For example:
accept/64926.ogg
in thetetiana
dataset (original text is "Уве́чері при ля́мпі ми сиді́ли в кімна́ті вчи́теля і розмовля́ли.").I'm working on a tool to flag files that might not match the text by assuming speakers maintain a fairly consistent speaking rate. I'll post more here when I get results, but was curious if anyone else had seen this.