cmusphinx / sphinxtrain

Acoustic model trainer for CMU Sphinx
Other
178 stars 112 forks source link

Split jobs according to time rather than number of utterances #35

Open dhdaines opened 2 years ago

dhdaines commented 2 years ago

Decoding in particular wastes a lot of time waiting for parts of the data that happen to contain very long utterances.

The simpler alternative is to shuffle the data, though this has its own problems.