Open cassiotbatista opened 1 year ago
Number of words in stats is also inflated with means src/stats.sh for voxforge should be disregarded from other FB datasets.
Makes sense because transcripts in VF are held in a PROMPTS file while in FB in *.txt files
train.list has lots of dups
problem in
src/split/split_voxforge.sh
?