cmusphinx / sphinxtrain

Acoustic model trainer for CMU Sphinx
Other
178 stars 112 forks source link

More diagnostics do not uppercase #25

Closed lenzo-ka closed 2 years ago

lenzo-ka commented 2 years ago

Uppercasing the phone names by default prevents phonesets like XSAMPA from being used, and there's no need to assume the file names are uppercase only. Also, emit a little more info while building models.

nshmyrev commented 2 years ago

and there's no need to assume the file names are uppercase only

When sphinxtrain creates tree files with phone names on a filesystem with case-insensitive names (Windows), we have troubles.

lenzo-ka commented 2 years ago

Shouldn't that be taken care of by the client? The assumptions are platform specific. In particular, making the phone names uppercase prevents referencing the phonemes by name or the words by name. If a user wants to make a case insensitive set of names, they should simply make sure that the names don't have conflicting upper and lower case names, and then normalize the dictionary, language model, and phone set before training.

lenzo-ka commented 2 years ago

I'll put it behind an option the config

lenzo-ka commented 2 years ago

Cleaning this up and re-submitting from private branch