QUESTION: Clarification on training a new model with phone groups and phonological rules

Responded to this mostly in the email, but for posterity, rules like:

rules:
  - following_context: '$'
    preceding_context: ''
    replacement: ''
    segment: 'ə'
  - following_context: '$'
    preceding_context: ''
    replacement: 'ɜ'
    segment: 'ə'
  - following_context: ''
    preceding_context: ''
    replacement: ''
    segment: 'ə'
  - following_context: ''
    preceding_context: ''
    replacement: 'ɜ'
    segment: 'ə'

Should capture the variable schwa realization. For the homorganic nasals, I typically don't use rules if it's always going to assimilate to place, but instead just have ŋ ɡ ɻ a instead of n ɡ ɻ a. If there is variability in realization, then using a rule might be better, something like across morpheme boundaries in English "information", "unformed", see https://github.com/MontrealCorpusTools/mfa-models/blob/main/config/acoustic/rules/english_mfa.yaml#L311-L314.

For reference, phone groups for 3.0 trained models are here: https://github.com/MontrealCorpusTools/mfa-models/tree/main/config/acoustic/phone_groups along with their rules here: https://github.com/MontrealCorpusTools/mfa-models/tree/main/config/acoustic/rules, and they'll be updated as I go through languages and train updated 3.0 models.

MontrealCorpusTools / Montreal-Forced-Aligner

QUESTION: Clarification on training a new model with phone groups and phonological rules #745