Dataset by Burling1967 - Githubissues

LinguList commented 8 years ago

also adding the meanings

LinguList commented 8 years ago

[x] digitize data
[ ] concept mapping
[ ] phoneme inventories
[ ] description of data
[ ] list of languages in data
[ ] orthography profile

nh36 commented 7 years ago

Here is the digitized Data. Please create a folder in the Data for Burling.

Proto-Burmish Burling Miyake.xls.zip

LinguList commented 7 years ago

I just started reviewing the ortho-profile. You can see the diff, of what I corrected:

https://github.com/digling/burmish/commit/405d836db9d1ba2de12135501e1ff64e37f70e88

Basic notes:

if you use a capital letter to indicate tones, it needs to be marked as a custom symbol, like L/, instead of simply L, since we need to know it's a custom symbol.
we annotate nucleus with offglide explicitly as such, by writing things like "ei" either as "e i" and "n nn" in structure, or "n N" in structure. In cases, where the tone is indicated on main vowel, this needs to be taken into account: èi is displayed as "è/e i L/" and the structure is "n nn t" or "n N t".
nasal vowels are decomposed, using the "∼" following the nasal vowel (preceding glottal stop, if one is there), and indicating this as a "c" . We do this to allow for alignments of this nasal element and the rest, so it needs to be indicated, if Rangoon "an" points to nasalization. If nasalization is only happening in rangoon, we need to add a second line, where the same is done without nasalization in those cases where the sequence is ambiguous.

I suppose you study the changes (see the nice link above, which contrasts them), and @nh36 then goes through the whole again, correcting in the rest (I did until half of the profile), and once this is done, I'll review again.

nh36 commented 7 years ago

Please take a new look at Burling. I think I have done what was needed.

LinguList commented 7 years ago

My script found the following incompatible lines (first line is "segments", second line is "structure", number is line number in the profile, incompatibility means: segments is longer than structure):

77   ə̀/ə L/ 
     n 

109      ē/e M/ 
     n 

129      o ʔ 
     n 

132      o u ʔ 
     n 

150      j ù/u L/ 
     i n 

164      ì/i k L/ 
     n c 

169      ín/ĩ ∼ H/ 
     n t 

174      á/a H/ + j ɛ̄/ɛ ʔ M/ 
     n + i n c t 

195      j ì/ĩ ~ L/ 
     i n t 

209      ɔ̀/ɔ n L/ 
     n 

212      à/a m L/ 
     n c 

213      ī/i n M/ 
     n c 

215      j ɔ́/ɔ H/ 
     i n 

216      ó/o k H/ 
     n c 

263      ʔ + l 
     c i 

267      ʔ + m 
     c i 

268      ʔ + s 
     c i 

312      k + ʃ 
     c i 

314      ʔ + n j 
     c i m 

323      sʰ â/a F/ 
     i n 

376      ì/i n L/ 
     n nn c t 

381      ú/u k H/ 
     n c 

387      ú/ũ ~ H/ 
     n t 

390      ù/u m L/ 
     n c 

401      j à/a L/ 
     n t 

434      ù/u ʔ L/ 
     c 

435      à/a n L/ 
     c 

462      j ɔ̀/ɔ ʔ L/ 
     i n c 

479      ù/u ~ L/ 
     c t 

495      j á/a ŋ H/ 
     c 

507      j ô/o F/ 
     c

LinguList commented 7 years ago

i just made an html interface, where you can check yourself: just paste the whole text from spreadsheet (don't mark extra columns, only the content, no "mark all" in spreadsheet!):

http://digling.org/profile/check.html

Output should be self-explaining, as it gives you line number and count of length difference, if there is one.

LinguList commented 7 years ago

Just tested it again on another profile, and it seems quite useful, so far. I'll try and turn this slowly into a bigger thing where people may even test the conversion by pasting two files, say one file with the profile, one file with the list of words they want to cut, but we'll need to see how well I can implement that segments-code in javascript.

nh36 commented 7 years ago

I have played with your new page, but it isn't clear to me how to get it to work. I press OK and nothing happens. But anyhow, methinks I have solved all of the problems in the Burling dataset. Please instruct.

LinguList commented 7 years ago

you need to paste the text into the textfield. For this, you can copy either from github (when opening to edit), or from spreadsheet, but opening from github will be best. Then it should work, at least it does with me. But I'll look later into it.

Best,

Mattis

LinguList commented 7 years ago

only three lines still problematic:

110 ē/e M/ n 216 j ɔ́/ɔ H/ i n 402 j à/a L/ n t

nh36 commented 7 years ago

I copied from within Github (while editing the data) and pasted where it said 'paste text here' then I pressed OK, and then nothing happened.

LinguList commented 7 years ago

I can't replicate the error, but we'll look into this.

nh36 commented 7 years ago

My efforts to fix 110 have backfired, but otherwise it should be fine now. Please take a look.

LinguList commented 7 years ago

yes, I'll take care of the last one. We'll leave this open to follow the next steps (concept linking, testing the conversion, as it may still contain problems which I can't capture with the current script, etc.). More later from my side.

digling / burmish

Dataset by Burling1967 #61