Closed kchall closed 10 years ago
This must be a relic of when it was first generated, so I'll start adding the ability to load the original IPHOD, using the translation from CMU to IPA stuff.
A related problem that we will run into shortly once our IPHOD has digraphs is that the Hayes feature system doesn't have any diphthongs in it. Is that because they're not in Hayes' book? Does the system have any way to deal with diphthongs?
Thanks, Michael. I don't have a copy of Hayes, actually. Does anyone? If not, I can ask Gunnar / Doug.
That's correct: the Hayes feature system assumes an inventory with only monophthongs. My usual workaround to this has been to add a couple of features, e.g. "diphthong" and "front_diphthong". I'll be glad to add these to our feature file (for review by everyone else) if you'd like, or we could pursue a different option.
Sure, we can add those features. Then, how do you treat other features within the diphthong, Blake? e.g., is [aI] [-high] or [+high] or both? Does Hayes discuss it in the textbook?
After discussing this with Kathleen, I've just created a new Hayes feature file that includes the 5 English diphthongs and two new features (same as in my previous comment) that allow them to be distinguished from their monophthong sisters.
It looks like I can't attach .txt files to these comments, so I've put the file ('ipa2hayes_diphthongs.txt') in Dropbox/Measuring_Phonological_Relations/Computational. Could someone who knows how the feature files work set it up so that our corpora and download source use this new feature file? @mmcauliffe @jsmackie
Of course, if anyone has an alternative solution to the Hayes diphthong problem, that'd be good too...
Ok, new feature files for both spe and hayes that have the English diphthongs in them are uploaded and can be downloaded through the GUI. IPHOD as well has been updated to have diphthongs (and other digraphs that weren't single characters like JH), and that can also be downloaded through the GUI.
Eep. Just realized that we currently don't recognize digraphs in the IPHOD corpus. E.g. "Buckeyes" is transcribed as [b, ə, k, a, ɪ, z] and not [b, ə, k, aɪ, z]. Symbols I know are digraphs in here are:
[eɪ] [oʊ] [ɔɪ] [aɪ] [aʊ]
[dʒ]
[tʃ] is ok, because it's represented using a unary character.