PhonologicalCorpusTools / CorpusTools

Phonological CorpusTools
http://phonologicalcorpustools.github.io/CorpusTools/
GNU General Public License v3.0
110 stars 16 forks source link

Digraphs in the IPHOD corpus #114

Closed kchall closed 10 years ago

kchall commented 10 years ago

Eep. Just realized that we currently don't recognize digraphs in the IPHOD corpus. E.g. "Buckeyes" is transcribed as [b, ə, k, a, ɪ, z] and not [b, ə, k, aɪ, z]. Symbols I know are digraphs in here are:

[eɪ] [oʊ] [ɔɪ] [aɪ] [aʊ]

[dʒ]

[tʃ] is ok, because it's represented using a unary character.

mmcauliffe commented 10 years ago

This must be a relic of when it was first generated, so I'll start adding the ability to load the original IPHOD, using the translation from CMU to IPA stuff.

A related problem that we will run into shortly once our IPHOD has digraphs is that the Hayes feature system doesn't have any diphthongs in it. Is that because they're not in Hayes' book? Does the system have any way to deal with diphthongs?

kchall commented 10 years ago

Thanks, Michael. I don't have a copy of Hayes, actually. Does anyone? If not, I can ask Gunnar / Doug.

bhallen commented 10 years ago

That's correct: the Hayes feature system assumes an inventory with only monophthongs. My usual workaround to this has been to add a couple of features, e.g. "diphthong" and "front_diphthong". I'll be glad to add these to our feature file (for review by everyone else) if you'd like, or we could pursue a different option.

kchall commented 10 years ago

Sure, we can add those features. Then, how do you treat other features within the diphthong, Blake? e.g., is [aI] [-high] or [+high] or both? Does Hayes discuss it in the textbook?

bhallen commented 10 years ago

After discussing this with Kathleen, I've just created a new Hayes feature file that includes the 5 English diphthongs and two new features (same as in my previous comment) that allow them to be distinguished from their monophthong sisters.

It looks like I can't attach .txt files to these comments, so I've put the file ('ipa2hayes_diphthongs.txt') in Dropbox/Measuring_Phonological_Relations/Computational. Could someone who knows how the feature files work set it up so that our corpora and download source use this new feature file? @mmcauliffe @jsmackie

Of course, if anyone has an alternative solution to the Hayes diphthong problem, that'd be good too...

mmcauliffe commented 10 years ago

Ok, new feature files for both spe and hayes that have the English diphthongs in them are uploaded and can be downloaded through the GUI. IPHOD as well has been updated to have diphthongs (and other digraphs that weren't single characters like JH), and that can also be downloaded through the GUI.