digling / tukano-project

Repository for the Tukano project (discussions and automatic data analyses)
GNU General Public License v3.0
0 stars 0 forks source link

Huber and Reed 1992 data on Tukano (and other colombian languages) #21

Open thiagochacon opened 8 years ago

thiagochacon commented 8 years ago

Mattis and I have been working on a digitized dataset 375 words of Tukanoan languages from Huber and Reed. Since we are gathering all tukanoan lexical resources in github, I though it would be nice to add H&R as well, with the hope we could have it in Reflex. what do you think, @nataliacp and @sflavier ?

mattis, could you upload the Huber and Reed Tukano data currently in Edictor to github?

nataliacp commented 8 years ago

are these 375 words all in the wordlist or only a subset of them. if the second, then we would need to match the two files to keep the information of the unified translation. For the matching we would need a spreadsheet that follows the Reflex importation template (basically the Kubeo or the Karapana file I have uploaded) and which field is identical between the template and the 740 file to do the matching.

amaliaskilton commented 8 years ago

The items in Huber & Reed are basically an extended Swadesh list. They are not a subset of the 740-list terms because the Huber & Reed gloss list (like Swadesh) includes pronouns and some other grammatical words that are excluded from the 740-item list.

On Tue, Feb 16, 2016 at 8:11 AM, Natalia Chousou-Polydouri < notifications@github.com> wrote:

are these 375 words all in the wordlist or only a subset of them. if the second, then we would need to match the two files to keep the information of the unified translation. For the matching we would need a spreadsheet that follows the Reflex importation template (basically the Kubeo or the Karapana file I have uploaded) and which field is identical between the template and the 740 file to do the matching.

— Reply to this email directly or view it on GitHub https://github.com/digling/tukano-project/issues/21#issuecomment-184749174 .

LinguList commented 8 years ago

I'd suggest to wait at this stage. The data is online, even on another github, and what is nice, we have mapped all concepts to the concepticon. Before adding Huber data, the 740 word concept list needs to be mapped to the concepticon. We can upload it to github, but I'd suggest we keep this for later to not get overwhelmed with too many bits of data and the like, since there's still lots of things to be done, like mapping languages to glottolog, refining the tokenization, etc.

But for all who want to have a look at the data:

http://tsv.lingpy.org?file=huber1992&remote_dbase=huber1992

You can even download the data in spreadsheet format from there (click on the save button, then on the download button on, the right little menu bar in the EDICTOR) and @nataliacp can look to which degree it's what you need for reflex.