dmort27 / panphon

Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
MIT License
212 stars 46 forks source link

Default ipa_bases file seems a bit inaccurate #22

Closed giddudink closed 3 years ago

giddudink commented 3 years ago

So I know I'm meant to customise the csv and yaml files, and I am, and this tool is great when used this way.

But, although I'm no linguist, I can't help but notice that a lot of stuff in the csv files is just wrong according to any other source I find. For example "a" being marked as tense rather than lax, everything being marked "0" on strident except the non-sibilant stridents. This makes several sounds have the same set of features when they shouldn't. For example, dentalised "s" and "z" are identical to the voiceless and voiced interdentals respectively, when they should be distinguished by +,- strident.

Again, I'm no linguist, so I'm hesitant to just commit my personal naive fixes to it, but these things jump out to me as being mistakes.

EDIT: I have since also noticed that "ɞ" is in there twice, with different features each time. One's identical to "o", the other is unique.

giddudink commented 3 years ago

I have since found Mr. Robert Gale's fork of this, which fixes most issues. I've removed features that weren't contrastive from it (removing dorsal and analysing flaps as +son +cons -cont and trills as +cont flaps for example) and it seems to be working well.

I also added five features for tone based on "William S-Y. Wang (1967) Phonological Features of Tone", removing the contour feature because it wasn't contrastive and deciding to represent mid as 0high. With this I defined 21 distinct tones through the diacritics file.

I really appreciate the generate_ipa_all script and the modular approach, it's made it very easy (if you don't count all the phonetics research) to fix stuff up and extend it. My complaints about the original ipa_bases file are still there however, I really think sibilants should be +strident regardless of which school of phonetics you subscribe to.

dmort27 commented 3 years ago

Hi @giddudink. Thanks for your comments. No system of features is likely to satisfy all needs, or all theoretical persuasions. My initial plan, and ultimate aspiration, was to make feature tables in PanPhon modular, so that users could add additional tables and they could be selected with via an argument to the FeatureTable constructor. I have not had the opportunity to do this yet.

Regarding tone, this is a perpetual problem—no SPE-style feature representation is adequate for tone, which has a suprasegmental nature, but autosegmental representations are much harder to formalize and use computationally than representations where each segment is a bundle of features.

Regarding [strident], yes: all sibilant fricatives and affricates should be [+strident]. If this is not the case in PanPhon's feature table, this should be fixed. However, it is not the case, conversely, than all [+strident] segments should be sibilants since, in some feature systems, non-sibilant fricatives like [f] and [ɸ] are distinguished by this feature. This is the analysis that has been chosen in PanPhon, for reasons of economy.

dmort27 commented 3 years ago

Update: I just fixed the [strident] problem—a major bug! Thanks for the heads up.