digling / tukano-project

Repository for the Tukano project (discussions and automatic data analyses)
GNU General Public License v3.0
0 stars 0 forks source link

karapana template #14

Open thiagochacon opened 8 years ago

thiagochacon commented 8 years ago

The karapana file is attached here. Please check the file for consistency regarding the guidelines you sent me, @nataliacp .

KarapanaReflex.xlsx

please tell me if the format is okay and if sending the file as xls over here is fine or we should have another procedure.

This is the most up to date and quality chekced data for Karapana, which means the data in the 740 list is less consistent and complete than this one.

The list of Karapana symbols were created in the symbol-list directory.

A list of most operations that I performed in the Karapana file are given below y > j / oral vowel y > ɲ / nasalized vowel ñ > ɲ d > n / _ nasalized vowel b > m / nasalized vowel ʉ > ɨ rĩhẽ ‘nominalizer’ > REMOVED rĩkẽ ‘nominalizer’ > REMOVED he ‘nominalizer’ > removed fix missing FUN c > k fixed SRC fixed CSO and CSA fixed FUN forms with ";" fixed polysemous forms fixed phonetic variants added morpheme breaks to some words ( hyphen "-" affix and blank space " " root boundary in compounds) added CSA and CSO comments deleted entries that have been fixed by other operations

LinguList commented 8 years ago

I justa dded the file here, at

If you check again from there, you may see more easily, whether columns are correct (it's what is important for lingpy).

It looks fine to me, so far. I'll add first testing routines tomorrow (or should I wait with this, @nataliacp ?).

Also, this kind of uploading files is easy from the web-template: just mark all entries in excel, copy them, open a new file in the folder in github ("new file"), and paste the content there. It will also tell you if there are errors in your tsv format.

nataliacp commented 8 years ago

We just had a look with Seb at the Karapana file and it looks fine! The only question I have is about the root boundary for the compounds. If really we have a compound, shouldn't it be written as one word? Why not use morpheme boundaries? after all the components of a compound are morphemes.

nataliacp commented 8 years ago

I have another question actually. What did you do regarding lax tags in this case? Did you copy them to the CSA field and replaced all unified translations?

amaliaskilton commented 8 years ago

@nataliacp and @thiagochacon regarding compounds: at some point in the discussion of conventions, we decided to annotate compounds in the same way as morph complex words consisting of a root and suffix. W/r/t use of a space between the roots, I would be concerned about the representation becoming ambiguous between a phrase consisting of multiple prosodic words and a root-root compound that is one prosodic word.

In the languages that I entered or checked (at least recently), I represented compounds in the phonemic with a dash between the roots, i.e. /Root1-Root2-/ and in the quasi with no dash, i.e. $Root1Root2$. Then I wrote 'compound' and the meanings of the components in the %% comments field. In the numerous cases where I couldn't produce a meaning for one element of the compound, but for prosodic or semantic reasons the item had to be a compound, I just wrote ?? for the relevant root in comments.

On Mon, Feb 15, 2016 at 11:43 AM, Natalia Chousou-Polydouri < notifications@github.com> wrote:

I have another question actually. What did you do regarding lax tags in this case? Did you copy them to the CSA field and replaced all unified translations?

— Reply to this email directly or view it on GitHub https://github.com/digling/tukano-project/issues/14#issuecomment-184359553 .

thiagochacon commented 8 years ago

Lax rows @nataliacp there was just once lax row "carayuru". I forgot to add a note in the CSA field. Could please do that?

Compounds Now I recall that we had a different marker for compounds. I guess it was a plus sign +. @levmichael ?

If so, can you change that automatically Natalia?

I think we should explicitly mark compounds as such (I am not sure if I have been consistent in my own Kubeo data). In ant way, if we decide we will mark compounds, I don't like having compounds and simplex words with the same boundaries. That is: I think they should be morphologically uniquely represented. Using a hyphen - makes it roots and affixes indistinct in the notation; writing them as a single word with no boundaries make them indistinct with monomorphemic words

levmichael commented 8 years ago

I just went through the conventions tabs on our various spreadsheets, and didn't find anything regarding compounds, although I believe we discussed the issue (so presumably somewhere in one of our threads). Before we decide what we should do, I think it would be helpful for us to summarize what each of us has done. @amaliaskilton has already said something about this -- @thiagochacon what did your students to for the non-Kubeo languages?