Open TalhaBedir opened 3 years ago
I find it quite adequate to treat the -lI form as an adjective derived from a noun, despite the shortcomings it has.
I think it would be possible to use advmod(soslu, acı)
instead of amod
, while keeping the ADJ
tag for acı.
The suffix does in fact modify the whole phrase acı sos but that seems to be the nature of agglutinating languages like Turkish, and isolating the suffix as a “syntactic word” would not help much (while complicating the processing) because we do not have relations for an “adjectivizing construction”. If I am not mistaken, the case suffixes behave similarly (also affecting the whole nominal while attaching only to the head noun), so one could theoretize about a new morphological case in Turkish, but I think it would be better to keep treating this process as derivational.
Example (4) looks very interesting from a Komi-Zyrian perspective. My question is whether the "attributive suffix" can be added after plural markers as well?
(4) acı soslu makarna
hot sauce.ATTR pasta
'pasta with hot sauce'
amod(makarna, soslu)
amod(soslu, acı)
In other words, would it be possible to say 'pasta with hot sauces' by saying the hypothetical acı soslarlu makarna
.
In Komi-Zyrian it is possible to
(5) гырысь позянлунъяса страна gïrïś pośanlunjasa strana great possibility.Plur.ATTR country 'A country with great possibilities' nmod(strana, poźanlunjasa) anmod(poźanlunjasa, gïrïś)
We have chosen to call this a NP head marker (case marker) almost entirely limited in range to the adnominal phrase. If there is no regular number variation, then I would stay away from our move.
I actually just realized that neither -lI nor -sIz permits any plural suffix inside, which is very interesting since the example you have provided produces extremely similar results as Turkish.
I did not do any research on this at all, but I guess it might be due to different properties of affixes or due to the fact that Turkish nouns are number-neutral in their bare forms:
(6) Kütüphane-den kitap al-dı-m
library-ABL book take-PAST-1s
'I have taken a book/books from library.'
It could be one book or 100 books, doesn't matter. Any numeral reading is available here. Therefore, it might be the case that the root should somehow stay in this bare form in order to be derived by ATTR without Crash.
I find it quite adequate to treat the -lI form as an adjective derived from a noun, despite the shortcomings it has.
I think it would be possible to use
advmod(soslu, acı)
instead ofamod
, while keeping theADJ
tag for acı.The suffix does in fact modify the whole phrase acı sos but that seems to be the nature of agglutinating languages like Turkish, and isolating the suffix as a “syntactic word” would not help much (while complicating the processing) because we do not have relations for an “adjectivizing construction”. If I am not mistaken, the case suffixes behave similarly (also affecting the whole nominal while attaching only to the head noun), so one could theoretize about a new morphological case in Turkish, but I think it would be better to keep treating this process as derivational.
What would then be done with something like,
(7) dört odalı ev
four room-with house
Would you have
nummod(odalı-ADJ, dört-NUM)
amod(ev-NOUN, odalı-ADJ)
Does the validator allow ADJ to have NUM dependents?
How about:
(8) Rüyada çok odalı bir evim varmış.
dream-LOC much/very room-with one house-my exists-PAST.EVID
"In my dream I had a house with many rooms."
çok here means "many/much", but it also means "very" (çok büyük - very big). At the moment in the treebank, the first reading is given with ADJ
and det
, the second with ADV
and advmod
, although this isn't very consistent, as with mst-0617
Buzlu ve çok sodalı. and mst-0771
Diğer çok kaliteli pilotların, subayların olayda ölmüş olması çok önemli bir konuydu.
This is also similar to the nominal -ed construction in English, e.g.
But not as much like e.g. "tree lin-ed street" or "grass cover-ed hill".
Does the validator allow ADJ to have NUM dependents?
I believe it does. It should because such configuration can also occur as a result of noun ellipsis and promotion of the adjective to the head. The Turkish examples in this thread are different because there is no ellipsis but I think they deserve the same treatment, as we do not have a special set of relations for modifiers of adjectives.
If the above is accepted, then it seems straightforward to also accept çok tagged DET
and attached as det
to the adjective odalı in (8). But it would also deserve to be described and exemplified in the Turkish-specific documentation, as it is an interesting and peculiar construction, and without explanation the annotation may be surprising to users.
Joining a bit late, but a few additional remarks:
The problems noted above becomes difficult as some of these "derived" forms are lexicalized. evsiz 'homeless' is likely lexicalized, and in its normal use, you cannot modify ev 'house' here, the word normally refers to a person. However, it is also possible (but not very likely) to have a sentence like Müstakil evsiz yapamam 'I cannot do without a standalone house'. Here, I'd be happy to treat these suffixes as case suffixes (although I do not know any linguist who calls these case markers), after all, Müstakil ev-de yaşıyor 'S/he lives in a standalone house' is not very different. However, there are cases where analysis gets tricky with these suffixes. Modifying one of the examples above,
1. üç çekmece-li dolap
three drawer-ATTR wardrobe
'wardrobe with three drawers'
2. üç çekmece-li-yi ben aldım
three drawer-ATTR-ACC I took-PAST-1SG
'I took the one with three drawers'
In both cases the numeral modifies the NOUN
inside the adjective. Since there is no ambiguity in (1), we may be happy with nmod(çekmeceli/ADJ, üç/NUM)
- not really standard or elegant but we can assume that there is some internal structure of the word and the numeral modifies a sub-part. However, this becomes ambiguous, since any ADJ
in Turkish can be used as a noun indicating an object with the property specified by the adjective (this is somewhat similar to the case of head promotion). This is what is happening in (2). Here, without segmenting the word, there is no way (I can think of) that tells whether there are three drawers, or three wardrobes. You can check a few additional (real-world) examples here.
In the "annotation guidelines" I am aware of (GB, BOUN, IMST, and even TR-DE SAGT), -lI and -sIz are segmented "if they are not lexicalized". The result is not very consistent. What is 'lexicalized' is generally a difficult decision for the annotators, and this is also not easy for a automatic method to segment reasonably. I agree that we need a better solution for these, but I do not expect to arrive at a good one soon. A good solution should also make sure that we cover the same issues in other Turkic languages, and possibly others like Komi (as noted above) which probably have similar cases. In the short term, I think it would be best to be as compatible with the current treebanks as possible.
I agree we should maintain what we have at the moment until something actually better comes up. I also agree that the lexicalised/non-lexicalised boundary is extremely difficult to draw, and in reality if we have to draw it without reference to something concrete "in the sentence" (e.g. with modifiers -- split, without -- don't split) then it will tend towards arbitrariness.
As an aside, this example also works with the -ed in English "I took the three-drawered one", but it is a lot more productive in Turkish than in English.
Acknowledging that this is a process in Turkish to turn nominal phrases into attributes (i.e. to make something that we call NOUN
function as an ADJ
), I would propose to simply annotate them as NOUN
s and nmod
s, while marking this special attributive form as a morpholexical feature (something like Form=Attributive
, or similar). At the same time, for completely lexicalised and crystallised terms like the mentioned evsiz, an analysis as ADJ
s would be justified, at the same time maintaining the Form=Attributive
mark pointing to its still transparent origin.
This would solve all problems of awkward "internal" dependencies: they are completely natural inside an nmod
. Besides, this would be a natural parallel with other (or even the same) languages using different strategies, for example nominal dependents introduced by prepositions. It just happens that Turkish uses a suffix (and I would not agree on tokenising it separatley, since it is clearly fused into the word, as e.g. vowel harmony shows).
By the way, even if I am proposing a morphological treatment, I am too against treating the -lI derivation as a case: probably I cannot explain myself well enough, but I think this is one morphological tool that the language has to "change the word class" of a phrase, and not to express that phrase's role in the sentence (which would be a case).
Turkish have a derivational suffix -lI that is generally referred to as "attributive suffix" in the literature. It is highly productive:
But it is also used in relatively fixed forms:
Its negative counterpart -sIz which roughly means "without" can negate the examples in (1) but generally not those in (2).
Currently, since these are derivational, we include full forms in lemmas, without any split. In that case soslu "with sauce" and çekmeceli "with drawers" in (1a,b) are
ADJ
in UPOS andamod
in dependency:However, when an adjective modifies soslu "with sauce", for example, or any other denominal adjective of this sort we have a situation like this:
This situation, for me, is troubling for two reasons:
Therefore I do not think the current annotation, that is (4), is doing this structure justice.