this will support training models that predict initials and finals separately (or even more complex architectures), which cuts down substantially on the number of labels needed and may improve accuracy.
[ ] update tests to predict onset and rime separately (simplest, two-class architecture)
[ ] remove the default model config from pipeline.py and initialize the model/config manually in tests for the number of variables/classes that are getting used in the tests
[ ] add an __init__ that looks at the Model we got passed and uses its n0s attribute to infer the number of possible phonemes in each syllable segment, storing this value on the pipe instance so it can be used later. also store the overall length of the output vector (len(n0s[0]) + len(n0s[1] + ... len(n0s[n]))
[ ] update the first block in initialize to read and add positioned phonemes from the training data
[ ] update Phonemizer.add_label to track which class the label belongs to when adding it
[ ] update predict to return List[Ints2d] and set the shape of guesses based on the number of possible phonemes per segment
[ ] update set_annotations to apply the predictions correctly across multiple classes - form the string by concatenating and then set it
this will support training models that predict initials and finals separately (or even more complex architectures), which cuts down substantially on the number of labels needed and may improve accuracy.
pipeline.py
and initialize the model/config manually in tests for the number of variables/classes that are getting used in the tests__init__
that looks at theModel
we got passed and uses itsn0s
attribute to infer the number of possible phonemes in each syllable segment, storing this value on the pipe instance so it can be used later. also store the overall length of the output vector (len(n0s[0]) + len(n0s[1] + ... len(n0s[n])
)initialize
to read and add positioned phonemes from the training dataPhonemizer.add_label
to track which class the label belongs to when adding itpredict
to returnList[Ints2d]
and set the shape ofguesses
based on the number of possible phonemes per segmentset_annotations
to apply the predictions correctly across multiple classes - form the string by concatenating and then set it