dmort27 / panphon

Python package and data files for manipulating phonological segments (phones, phonemes) in terms of universal phonological features.
MIT License
212 stars 46 forks source link

are phonological features correct in ipa_bases.csv? #17

Closed lxkain closed 4 years ago

lxkain commented 4 years ago

Hello, I am not a linguist or a phonetician... I am looking at ipa_bases.csv and I don't understand what I'm seeing for the subset of phonemes that are in the English language. A colleague pointed out to me the following:

I have been blindly trusting the feature table up until now, and now I'm confused... is this a bug?

lxkain commented 4 years ago

I'm also finding:

ERROR:root:Identical features (ignoring nans)
    syl  son  cons  cont  delrel  lat  nas  strid  voi   sg   cg  ant  cor  distr  lab   hi   lo  back  round  velaric  tense  long  pause
ʊ   1.0  1.0   0.0   1.0     0.0  0.0  0.0    NaN  1.0  0.0  0.0  NaN  0.0    NaN  0.0  1.0  0.0   1.0    1.0      0.0    0.0   0.0    0.0
ə˞  1.0  1.0   0.0   1.0     0.0  0.0  0.0    NaN  1.0  0.0  0.0  0.0  0.0    NaN  0.0  1.0  0.0   1.0    1.0      0.0    0.0   0.0    0.0
rcgale commented 4 years ago

Hi David!

I've done some work to reconcile the feature set in ipa_base.csv with the features in the Hayes text/app. Here's a google sheet with the changes highlighted:

https://drive.google.com/file/d/1MsUJSvsiSqaIMjoDUstBmWL2IfxQtGpG/view?usp=sharing

There are a large number of changes, many of which are fairly systematic, so I expect there may be some rationale behind the differences. I'd love to hear your thoughts!

**EDIT: We also noticed, at least after our changes, that the absence of a 'front' feature eliminates important contrasts including /æ/ /ɑ/ vs /a/ . We're experimenting with adding the feature for our current project. I wont update the linked spreadsheet just yet for laziness reasons, but I figured it was another item worth clarifying with you. Thanks!

dmort27 commented 4 years ago

@lxkain , note that there are a wide variety of phonological feature systems and these differ in various ways. The feature system used here is based on the system from Chomsky & Halle's (1968) Sound Pattern of English and reflected in Odden (2005). This differs from the feature set in other widely used textbooks, including Hayes (2009). There are some additions which had to be added for sounds (clicks) not represented in these sources.

* /ɑ/ vs. /a/ should have unequal back features

The should probably be fixed. The issue is that /a/ is frequently used to represent a low central vowel, which—like  /ɑ/ would be [+back]. The IPA value is front however, which should be [-back].

* Delayed release: Every consonant that is not a stop should be +DELREL, and all stops should be -DELREL or 0

This is a strange claim. [delrel] is used to distinguish plosives from affricates. It is irrelevant elsewhere and should thus be unspecified (0).

* Velaric: The Hayes book uses “dorsal” which I think would include velaric. Even if velaric is something more specific, /k/, /g/, and /ŋ/ are all English consonants on the current IPA chart under the Velar placement column.

Velaric != Velar. Velar is a place of articulation. Velaric is the airstream mechanism used to produce clicks. Every click should be [+velaric] and every non-click should be [-velaric].

* Strident: I would expect there to be each of +/-/0 in English. I forget where this would be a discriminating feature, but /s/ is +STRID, while /t/ would be -STRID, and /f/ would be 0 (which the panphon chart says is 1?)

[strident] distinguishes noisy fricatives from less noisy ones. It is the feature that distinguishes [f] from [ɸ] and one of the features that distinguishes [s] from [θ].

* Spread glottis: Consonants should be all -SG except /h/ which is +SG, though that’s probably redundant to some place features for discrimination

This is simply wrong. All sounds that are produced with abducted vocal folds are [+sg], in every feature system. This means that /h/ is [+sg] but also means that aspirates are all [+sg] and breathy voiced sounds are all [+sg].

dmort27 commented 4 years ago

@rcgale ,

Thank you for your work. Indeed, the feature system represented in PanPhon's ipa_bases.csv is different from the Hayes system. This is because it is based on the SPE and Odden systems, and is thus more conservative that Hayes's more phonetically realistic system. For example, as you note, the Hayes system introduces the feature [front] which is absent in the older systems. This has the consequence of generating too many places of articulation for consonants (and predicting a natural class that doesn't exist), but provides a more intuitive account of vowel place.

Some of the differences have to do when with a feature is unspecified. In the Hayes system, this is a phonetic claim and in the Odden system this is a phonological claim and these lead to different results. The ultimate problem is that no phonological feature system is universally accepted. The internals of PanPhon are designed to take this into account.

From the beginning, the intent was to ship pluggable feature systems so that the user could select which system of features they wanted to use. My next major goal for PanPhon, is to add in the feature system used in PHOIBLE (https://phoible.org/), which has near-universal coverage. If you would be interested in putting together an ipa_bases_hayes.csv and rules for the diacritics and modifiers (which would also have to be changed), we could also add this. Then PanPhon would support three feature systems instead of one!

rcgale commented 4 years ago

Thank you for the thoughtful reply!

This is because it is based on the SPE and Odden systems, and is thus more conservative that Hayes's more phonetically realistic system

Yes, I think this explains the core of the discrepancies. My phonological background was basically around Hayes, but I just now did some reading in Odden (2005) on the [strident] feature, which goes to some length to explain why they arrived at the decision only to contrast the two pairs of phonemes that you have defined in your data. It makes sense for the purpose of contrasting and discriminating phonemes, but our current purpose is more phonetic/acoustic, so as you mention, Hayes is a better fit.

If you would be interested in putting together an ipa_bases_hayes.csv and rules for the diacritics and modifiers (which would also have to be changed), we could also add this.

Yes, I could provide this! It wouldn't be much more work than I've already done for our project, but I'll need to clean up a few details, since my approach so far was an attempt at merging ip_bases.csv with the Hayes data, and it sounds more appropriate to take a clean-slate approach.

As for your response to @lxkain (who was quoting my first impressions before I really dug in), the discrepancies between the theories—and a few misunderstandings on my part—settle most of the questions we had. /ɑ/ vs. /a/ is like you said probably a mistake, and I haven't caught up on how Odden describes /ʊ/ vs /ə˞/, but it looks like the only contrast there is a [+ant] vs undefined, so that might be worth a double-check as well. If you didn't notice, the spreadsheet I linked has a second tab ("Before") that highlights all the ipa_bases.csv values that disagree with Hayes that could be useful as a sanity check on the existing data if you have any doubts left. I think we're good to go for our purposes though!

Anyway, thanks again for the detail in your responses, and for Panphon which was, as you'd hoped, quite easy to custom-fit to our more Hayes-oriented purpose.

PS - I first heard of PHOIBLE a few days ago from a colleague, and if we end up doing more work like this I might look into importing some data from there. I'll be sure to keep you posted if I import some of their data, but it's good to know you're thinking about that too!

lxkain commented 4 years ago

@dmort27, thank you so much for being in a conversation around this. Really appreciate your library. Best!