Closed cassiotbatista closed 2 years ago
An additional note: some recipes use all the data right at monophone training, but I believe that isn't very helpful especially for some datasets like lapsstory in which the length of the utts are unusually long (>= 30s).
When training individually, beam has to be scaled up to force-align lapsstory because of its long utts.
I don't think that matters for the DNN at the end of the day but I've made some kind of avg mean on the numbers of librispeech and aspire recipes. This is just for logging / reporting purposes.
One thing I didn't not really took care of was to watch for the selection of westpoint's utts in the shortest for monophone training as I believe the dataset might dominate the subset because of it's full of word-pieces. Just smt to keep in mind.