glottobank / tukano

Repository for computer-guided reconstruction with Jena wordlist standard for Tukano language data
GNU General Public License v2.0
1 stars 0 forks source link

something wrong with Siona and Sekoya #13

Closed thiagochacon closed 9 years ago

thiagochacon commented 9 years ago

I have just noticed that the Siona language (SIO) is missing from the DB. Worse: it looks like SIO data was merged with SEK data, or replaced. Can't tell. Can we upload an older version and fix it?

(BTW, I checked the alignments between 142-100. They look all great so far.)

LinguList commented 9 years ago

Ups...

The error is simple, but very greenhornish from the computational perspective:

For every cognate set, I introduced the proto-form.

Whenever I did that, I forgot to also include the first language in that line, which is SIO.

This is good and bad: good is, that I can restore the data for SIO easily and that it wasn't really merged. Bad is that this will disturb the current state a little, and that we'll still have to re-align stuff.

This reminds me that I should somehow find a way to check for consistency when doing these computational things on the data, since otherwise one can get easily confused and in the end the data will be in a messy state due to tiny or less tiny errors like this one.

I'll make that a complete issue and will start thinking how the check between original and new computational version can be done consistently.

I'll also let you know when I have restored the data by closing this issue.

LinguList commented 9 years ago

The error is even more persistent: Since I put proto-tukano in each cognate set for one of the doculects, SEK is also "infected". So my first patch (currently already online), won't do. Instead I have to check for each doculect, whether it's missing.

But this is possible, and it'll be hopefully done in minutes...

LinguList commented 9 years ago

Allright, I am pretty confident that everything is fine with the last update now. When counting the number of original entries, we have 1402 there for all doculecst (referring to the spreadsheet). With the update now, in which I added all proto-forms as separate entries, we have 1543 entries now, which is exactly what is excpected, since there are 141 proto-forms: 1402 + 141 = 1543.

So this is hopefully really working now.

Sorry for the trouble...