direct-phonology / phoNy

phonology in spaCy!
MIT License
0 stars 0 forks source link

support tagging phonological features #7

Open thatbudakguy opened 2 years ago

thatbudakguy commented 2 years ago

this would more properly be called the Phonologizer, and it could borrow heavily from spaCy's Morphologizer. see for reference Wikipedia on "distinctive features".

thatbudakguy commented 2 years ago

ultimately this could just be another function of the Phonemizer — when the output of the model is just a vector, it's up to the component how to translate that information into phonological data. we could have a new component type that sets phonological properties on tokens, or we could just make this a method available on the Token itself, so that the downstream consumer can request both the phonological features or the phonemes themselves from the same source data.

thatbudakguy commented 2 years ago

this becomes synonymous with the existing phonemizer as part of #24; we should rename it Phonologizer accordingly.

also with #22 we should make both components respect overwrite/extend config options (as spacy builtins do) so that they can work together in concert: rule-based runs first, then the statistical version runs and fills in all the gaps (e.g. polyphones).