cmusphinx / sphinxtrain

Acoustic model trainer for CMU Sphinx
Other
178 stars 112 forks source link

Make an option in config for not folding case in phonemes #26

Closed lenzo-ka closed 2 years ago

lenzo-ka commented 2 years ago

By default, this leaves the behavior as it was, but adds CFG_CASEDSYMBOLS option (by default "no", must be "yes" to do anything) to sphinx_train.cfg. This opens up the possibility of using phone sets like XSAMPA which have phonemes that differ only in case, such as u / U or z / Z on systems that have case-dependent file systems such as linux (or macOS when formatted as case-dependent).

Note that case-independent file systems can have collisions between casing variants irrespective of this change, and there is nothing that warns a user if there is a conflict. This change does nothing to alter the fact that the user needs to be aware of the phone sets. I would argue that the default should not fold case, because case-independent systems don't care anyway, and by forcing case generality is removed. However, to accommodate comments on a prior PR, and to leave the behavior unchanged by default, this change has to be invoked explicitly.

nshmyrev commented 2 years ago

@lenzo-duo please just merge