Split jobs according to time rather than number of utterances

cmusphinx / sphinxtrain

Acoustic model trainer for CMU Sphinx

Other

178 stars 112 forks source link

Open dhdaines opened 2 years ago

dhdaines commented 2 years ago

Decoding in particular wastes a lot of time waiting for parts of the data that happen to contain very long utterances.

The simpler alternative is to shuffle the data, though this has its own problems.