digling / burmish

LingPy plugin for handling a specific dataset
GNU General Public License v2.0
1 stars 1 forks source link

Dataset by Huang1992 #81

Closed LinguList closed 7 years ago

LinguList commented 7 years ago

This dataset is mostly processed, but for further expansion, we need orthoprofiles, etc. I was just looking into this and realized the following:

Basically, we can handle this, as we have two separators, like "+" for strong association, '_' for word-level association. But how do we mark the elements in their structure? I imagine, that week elements will have a different phonology, so they may impact on QPA and should probably be treated separately. I also remember I read this in the Wannemacher phonological introduction. So we need to reflect this already in the orthography profiles, maybe by using capital letters for stressed syllables and normal letters for unstressed and weak syllables (where we can determine this!).

LinguList commented 7 years ago

Written Burmese and Rangoon, here is my initial interpretation of ALL (!) TBL data:

Huang1992-WB.xlsx

What is missing is phonological interpretation in terms of "CLPA", a potential orthography, and an answer on how to interpret finals with two codas (glide and stop). Maybe, switching to nucleus-glide-coda would be more realistic, as these codas also occur in other languages. I know that some of this work has already been done, but I suggest you, @nh36 build on that to add the conversions here, as this list is more complete than the ones we had before, as they only applied to the data in the sample. The items in brackets in the end, labeled as "e" in type are exceptions, which should be probably hand-coded, and maybe marked as such, as they seem to indicate facultative words or the like, I don't know. To handle sounds like o₂, I have a new clpa-convention, that allows to specified one speculative IPA sound and to use a graphemic representation after it, separating both by a dash (without spaces!), so writing laryngeal h₂, could now be done as p ə/h₂ t e: r in the word for "father",with a classical and a quasi-phonetic interpretation.

Note that I prepared the whole orthoprofile dump for the data (missing clpa readings, but otherwise having the structures as I interprete them), but I suggest we discuss the production and specifics now in the Burmese data and than advance language by language, as we need to discuss Atsi and Lashi, if they have those week syllables (we need to mark the whole syllable as such, to have better chances with QPA).

nh36 commented 7 years ago

Huang1992-WB.xlsx

Here is my best effort at this orthography profile.

nh36 commented 7 years ago

You have now decided that you would rather do individual orthoprofiles for each language in Huang, I guess. But, you can understand that I am not eager to do all of my work again. At least for Written Burmese, which is tricky. Can you please use the attached to generate a new orthoprofile which I can correct by hand? orth Huang1992-just WB.xlsx

LinguList commented 7 years ago

excellent, we'll group the data, and I suppose we make, say xiandao/achang, and what you prefer, maybe two groups, and with the WB, this should do the trick. But please be patient until I found time to parse the Huang data (it's tricky, as STEDT digitizes the data at times inconsistently, using different symbols for missing data, etc.).

LinguList commented 7 years ago

I'll close this issue and open a new one to only deal with orthoprofiles.