MorphDiv / TeDDi_sample

Text Data Diversity Sample (TeDDi Sample)
Other
5 stars 3 forks source link

Kayardild annotation tiers #252

Closed tsamardzic closed 1 year ago

tsamardzic commented 2 years ago

Why is

\ Ngij-in-thabuju-karra maku-. \ ŋicu-iɲ-t̪apucu-kara maku-a \ 1sg-fINY-e.Br-fGEN.T wife-T

but

\ Nga-da dathin-ki-na- wuu-j-arra- wuruman-ki-na- nguku-rnurru--na- \ ŋat̪-ta ʈat̪in-ki-naa-ø wuː-c-ŋara-ø wuɻuman-ki-naa-ø ŋuku-ɳuru-ki-naa-ø \ 1sg-T there-fLOC-fABL-T put-TH-fCONS-T billy-fLOC-fABL-T water-fASSOC-fLOC-fABL-T

?

I did everything like \<line 3>.

bambooforest commented 2 years ago

@tsamardzic -- I didn't add examples 1, 2 and 3 -- they were there already. I suppose you can change the tags in your PR and that should make them more in-line with what you and I added.

christianbentz commented 2 years ago

@tsamardzic @bambooforest I think the problem is that different examples in Round have different levels of annotation. All annotations which are there should be included. The order of annotations is like in the first line: line, phonological, segmentation, morphomic, glossing, translation, comment.

bambooforest commented 2 years ago

I noted this as well and I included them when as they were given.

tsamardzic commented 2 years ago

@christianbentz Yes, we both followed this order, but the problem is that our tags are inconsistent: for the same annotation level, we sometimes say it's \ and sometimes \. This is the case with the second level in my example above. There is an issue with the third level too, which is sometimes tagged as \ and sometimes as \. Can you tell us which option is correct? I will then correct the whole file to make it consistent.

@bambooforest lines 33-37, then 53 are also like 3 (must be something about the number 3! ;))

bambooforest commented 2 years ago

Round writes:

The first line (a) contains an orthographic form, divided by hyphens at approximate morph boundaries.9,10 The remaining lines of a maximally explicit gloss display (b) a surface (lexical level) phonological representation, which is unhyphenated; then (c) an underlying phonological representation; (d) a morphomic representation, and (e) a semantic and morphosyntactic gloss, all of which are hyphenated. For sentential examples, a free translation (f) is given in English and the source of the example is indicated.

This would be for:

https://github.com/100LC/100LC/blob/master/Database/tests/Corpus/Kayardild_gyd/grammar/spoken/gyd_gre_1.txt#L18-L24

So for example 2.27:

https://github.com/100LC/100LC/blob/master/Database/tests/Corpus/Kayardild_gyd/grammar/spoken/gyd_gre_1.txt#L34-L39

There is no phonological tier (i.e. an unhyphenated row).

But since he does use hyphens (at least at the end of words) with five rows, e.g.:

https://github.com/100LC/100LC/blob/master/Database/tests/Corpus/Kayardild_gyd/grammar/spoken/gyd_gre_1.txt#L94-L99

Your guess is as good as mine for what to do.

christianbentz commented 2 years ago

Yes, note that there is a bit of a mess-up here. The first example line (line_1) I gave is from Round (2013) which was published by Oxford University Press. All other lines seem to be from Round (2009) which is his original PhD thesis of which I gave you a pdf copy. The two documents differ somewhat with regards to the annotations and also the example sentences and numbering.

So, we should remove and also change the header to say that this data comes from Round (2009) with the reference to his PhD thesis. I remember we took the decision to use Round (2009) also because of copyright.

tsamardzic commented 2 years ago

So, here's my TODO list:

@christianbentz Any other items?
It would be a great help if you could leave here the revised version of the header. Thanks!

christianbentz commented 2 years ago

Here is the revised header. Note that I had to remove the hashtags at the beginning of each line:

language_name_wals: Kayardild language_name_glotto: Kayardild iso639_3: gyd year_composed: NA year_published: 2013 mode: spoken genre_broad: grammar genre_narrow: NA writing_system: Latn special_characters: Special characters follow IPA conventions and are detailed in Round (2009), p. 35. short_description: These are example sentences from Round, Erich R. (2009). source: Round, Erich R. (2009). Kayardild morphology, phonology and morphosyntax. PhD thesis, Yale University. copyright_short: © Erich R. Round 2010, All rights reserved. copyright_long: NA sample_type: whole comments: The annotations are explained in more detail in Round (2009), p. 31-32. The first line in each example () corresponds to Round's (a) "orthographic form", corresponds to (b) "surface (lexical level) phonological representation", corresponds to (c) "underlying phonological representation", corresponds to (d) "morphomic representation", corresponds to (e) "semantic and morphosyntactic gloss", corresponds to (f) "free translation". In case there are letters in index position in the original text/annotations, these are given after an underscore "_".

christianbentz commented 2 years ago

@tsamardzic You could remove the example sentence in the current file, and hence renumber the other lines -1.

ximenina commented 1 year ago

I think all these remarks were already integrated in https://github.com/MorphDiv/TeDDi_sample/blob/master/Corpus/Kayardild_gyd/grammar/spoken/gyd_gre_1.txt . I'm closing this issue.