Closed EyalLavi closed 4 years ago
The root cause is the regex in the tutorial: benchmarkstt-tools normalization --inputfile qt_kaldi.json --outputfile qt_kaldi_hypothesis.txt --regex '^.*"text":"([^"]+)".*' '\1'
On further investigation, it looks like the regex does work, but the file in the repo has the content duplicated.
The Kaldi hypothesis file has the transcript duplicated. This creates WER > 1.