glottobank / tukano

Repository for computer-guided reconstruction with Jena wordlist standard for Tukano language data
GNU General Public License v2.0
1 stars 0 forks source link

Cognate Set #61 (people) had mis-coded alignment #8

Closed LinguList closed 9 years ago

LinguList commented 9 years ago

I just cleaned the cases (too much white-space, confusing the representation), but I'm not sure how well my adjustment ist.

Please close this issue if my modifications to the alignment are OK.

thiagochacon commented 9 years ago

Actually, I just realized look at this data:

in my reconstruction PT tj and tʔj are complex articulated single phonemes. In the alignments, they are split in different cells t j

LinguList commented 9 years ago

Yep, I also realized that. For the 40 alignments which I just did yesterday, I corrected it manually like this:

Got to the "alignments" field where you find the proto-form. Edit the text accordingly, that is: white space is the separator, everything not separated by whitespace is one segment. Then hit right mouse or doubleclick in the COGID field, and you can then adjust the alignment in the new fashion with corrected tokenization.

This is a bit tedious, yet I figured that in case I start doing this automatically, it may even take us longer, since we need to discuss and cover all cases of complex segments and I need to set up a segmentation script only for this case.

That works for you?

thiagochacon commented 9 years ago

that is the way I was doing the alingments before. I liked it very much, as we pay even more attention to the data.

Date: Wed, 12 Nov 2014 10:39:53 -0800 From: notifications@github.com To: tukano@noreply.github.com CC: thiago_chacon@hotmail.com Subject: Re: [tukano] Cognate Set #61 (people) had mis-coded alignment (#8)

Yep, I also realized that. For the 40 alignments which I just did

yesterday, I corrected it manually like this:

Got to the "alignments" field where you find the proto-form. Edit the

text accordingly, that is: white space is the separator, everything not

separated by whitespace is one segment. Then hit right mouse or

doubleclick in the COGID field, and you can then adjust the alignment in

the new fashion with corrected tokenization.

This is a bit tedious, yet I figured that in case I start doing this

automatically, it may even take us longer, since we need to discuss and

cover all cases of complex segments and I need to set up a segmentation

script only for this case.

That works for you?

— Reply to this email directly or view it on GitHub.

                  =
LinguList commented 9 years ago

Allright, cool then that we have an agreement here. I tried to do as much automatically as I could, but I forgot and was not really sure about the proto-language yesterday. But you are actually right about the "paying attention" part: When I corrected some forms yesterday, I would first think the words weren't cognate but afterwards found that the correspondences are regular. So paying attention really pays out...

I'll close this one for now. Let's reopen the issue in case new things related to this one show up.