Closed jasiewert closed 6 months ago
I would slightly lean towards det
(the :predet
subtype is not needed but it could be defined and documented for the language, too). A "fixed" expression with gaps will not render the treebank invalid but it will yield warnings (as it does for Dutch).
perhaps there is inspiration to be found from the German treebank? I am thinking of the German equivalent (assuming those are similar) to the Dutch split "wat voor" cases.
Wat heb je voor (een) boeken gelezen?
GJ
On Mon, Mar 25, 2024 at 10:45 AM @. @.> wrote:
the UD validator complains about this type of annotation as well, so we may need to reconsider it ;-) It is actually a relic from the underlying treebank (the syntactically annotated part of the Corpus of Spoken Dutch) where this was annotated as a discontinuous multi-word expression (stretching the notion of MWE a bit). It is true that the construction is exceptional, and there are cases like the first Dutch example that do suggest that there is a determiner 'wat een' that should be analyzed as MWE. As soon as you do that, however, the discontinuous cases become a problem. Suggestions for a better solution are welcome!
On 3/25/24 09:00, Janine Siewert wrote:
In Low Saxon, as well as in Dutch, the construction corresponding to English "what a (ADJ) NOUN" in expressions of surprise can be interrupted by other elements.
In the English GUM treebank, they call this a det:predet, a relation that is not documented here https://universaldependencies.org/u/dep/index.html and only listed in the language-specific guidelines.
sent_id = GUM_vlog_lipstick-41
addressee = Hershey
s_prominence = 3
s_type = other
speaker = AlyssaMarie
transition = establishment
text = What a sweet baby.
newpar
newpar_block = sp who:::"#AlyssaMarie" whom:::"#Hershey" (1 s)
1 What what DET WDT PronType=Int 4 det:predet 4:det:predet Discourse=evaluation-comment:56->57:0:lex-indwd-387 2 a a DET DT Definite=Ind|PronType=Art 4 det 4:det Entity=(33-animal-giv:act-cf1*-3-coref 3 sweet sweet ADJ JJ Degree=Pos 4 amod 4:amod 4 baby baby NOUN NN Number=Sing 0 root 0:root Cxn=Exclamative-What|Entity=33)|MSeg=bab-y|SpaceAfter=No 5 . . PUNCT . 4 punct 4:punct _
In the Dutch Alpino treebank, however, the interrogative(?) pronoun wat and the indefinite article een are treated as fixed even when they are separated by other elements:
source = Treebank/cgn_exs/257.xml
sent_id = cgn_exs\257
text = wat een boeken heeft die man gelezen!
auto = ALUD2.8.5
1 wat wat PRON VNW|excl|pron|stan|vol|3|getal Person=3 3 det 3:det 2 een een DET LID|onbep|stan|agr Definite=Ind 1 fixed 1:fixed 3 boeken boek NOUN N|soort|mv|basis Number=Plur 7 obj 7:obj as such 4 heeft hebben AUX WW|pv|tgw|met-t Number=Sing|Tense=Pres|VerbForm=Fin 7 aux 7:aux 5 die die DET VNW|aanw|det|stan|prenom|zonder|rest 6 det 6:det 6 man man NOUN N|soort|ev|basis|zijd|stan Gender=Com|Number=Sing 7 nsubj 7:nsubj 7 gelezen lezen VERB WW|vd|vrij|zonder VerbForm=Part 0 root 0:root SpaceAfter=No 8 ! ! PUNCT LET 7 punct 7:punct _
source = Treebank/cgn_exs/258.xml
sent_id = cgn_exs\258
text = wat heeft die man een boeken gelezen!
auto = ALUD2.8.5
1 wat wat PRON VNW|excl|pron|stan|vol|3|getal Person=3 6 det 6:det 2 heeft hebben AUX WW|pv|tgw|met-t Number=Sing|Tense=Pres|VerbForm=Fin 7 aux 7:aux 3 die die DET VNW|aanw|det|stan|prenom|zonder|rest 4 det 4:det 4 man man NOUN N|soort|ev|basis|zijd|stan Gender=Com|Number=Sing 7 nsubj 7:nsubj 5 een een DET LID|onbep|stan|agr Definite=Ind 1 fixed 1:fixed 6 boeken boek NOUN N|soort|mv|basis Number=Plur 7 obj 7:obj 7 gelezen lezen VERB WW|vd|vrij|zonder VerbForm=Part 0 root 0:root SpaceAfter=No 8 ! ! PUNCT LET 7 punct 7:punct _
A construction that can be separated by several words belonging to different syntactic phrases does not sound very fixed to me. Neither does a det(:predet)relation seem intuitive if the supposed predeterminer can be placed several syntactic phrases apart from noun it determines. Should I nevertheless follow the English or the Dutch analysis in my Low Saxon treebank or would a different analysis be more appropriate?
— Reply to this email directly, view it on GitHub https://github.com/UniversalDependencies/docs/issues/1021, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEMMHZJUV3YXHOHTSUOKIXDYZ7KSDAVCNFSM6AAAAABFGQZV46VHI2DSMVQWIX3LMV43ASLTON2WKOZSGIYDKMJSGQ3TCNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Gosse Bouma, Communication and Information Science, Groningen University, P.o. box 716, 9700 AS @.*** tel. +31-50-3635937
I suppose the German equivalent is Was für ein X! I found three instances in German GSD (among the 8 hits returned by this query), there are two other approaches as inspiration :-) One of them seems wrong to me, the other is
nmod(was, X)
case(X, für)
det(X, ein)
which I sort of like.
Another reason why I am hesitant to use the det
relation is the fact that wat can be used in the same way in other expressions of surprise as well:
# sent_id = LSDC_011_DNS_1904_HAM_bahnmeester_dood
# text_orig = Wat mien Süsterdochter fix is: de kann Gedanken läsen!
# text = Wat myn süsterdochter fiks is: dee kan gedanken leasen!
1 Wat wat PRON _ _ 4 ? _ lemma_gml=watte
2 myn myn DET _ Case=Nom|Number=Sing|Number[psor]=Sing|Person[psor]=1|Poss=Yes|PronType=Prs 3 det:poss _ lemma_gml=mîn
3 süsterdochter süsterdochter NOUN _ Case=Nom|Gender=Fem|Number=Sing 4 nsubj _ lemma_gml=süsterdochter
4 fiks fiks ADJ _ Degree=Pos 0 root _ _
5 is weasen AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Cop 4 cop _ lemma_gml=wēsen|SpaceAfter=No
6 : : PUNCT _ _ 10 punct _ _
7 dee dee PRON _ Case=Nom|Gender=Fem|Number=Sing|Person=3|PronType=Dem 10 nsubj _ lemma_gml=dê
8 kan künnen AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|VerbType=Aux 10 aux _ lemma_gml=künnen
9 gedanken gedanke NOUN _ Case=Acc|Gender=Masc|Number=Plur 10 obj _ lemma_gml=gedanke
10 leasen leasen VERB _ VerbForm=Inf 4 parataxis _ lemma_gml=lēsen|SpaceAfter=No
11 ! ! PUNCT _ _ 10 punct _ _
I think a comparable usage is possible in Dutch as well, but I do not have native speaker intuition there. In the example above, there is nothing that wat could be a determiner of, but it intuitively feels like the same usage as in:
# sent_id = LSDC_0297_NWF_1882_OVY_deventer_t.w._van_marie_-_de_bruud_en_de_wedevrouwe
# text_orig = Jonges, jonges, wat zi'j en mooi knap bruudjen 'eworden!
# text = Junges, junges, wat sin y en mooi knap brüüdjen eworden!
1 Junges junge NOUN _ Case=Nom|Gender=Masc|Number=Plur 12 vocative _ lemma_gml=junge|SpaceAfter=No
2 , , PUNCT _ _ 3 punct _ _
3 junges junge NOUN _ Case=Nom|Gender=Masc|Number=Plur 1 conj _ lemma_gml=junge|SpaceAfter=No
4 , , PUNCT _ _ 12 punct _ _
5 wat wat PRON _ Case=Nom|Gender=Neut|Number=Sing|PronType=Int 11 ? _ lemma_gml=watte
6 sin weasen AUX _ Mood=Ind|Number=Plur,Sing|Person=2|Tense=Pres 12 aux _ lemma_gml=wēsen
7 y jy PRON _ Case=Nom|Number=Plur,Sing|Person=2|PronType=Prs 12 nsubj _ lemma_gml=gî
8 en en DET _ Case=Nom|Definite=Ind|Gender=Neut|Number=Sing|PronType=Art 11 det _ lemma_gml=êin,êine,êin
9 mooi mooi ADJ _ Case=Nom|Degree=Pos|Gender=Neut|Number=Sing 11 amod _ lemma_gml=mö̂ye
10 knap knap ADJ _ Case=Nom|Degree=Pos|Gender=Neut|Number=Sing 11 amod _ lemma_gml=knap
11 brüüdjen brüüdken NOUN _ Case=Nom|Gender=Neut|Number=Sing 12 xcomp _ lemma_gml=brü̂deken
12 eworden werden VERB _ Tense=Past|VerbForm=Part 0 root _ lemma_gml=wērden|SpaceAfter=No
13 ! ! PUNCT _ _ 12 punct _ _
In the second example, an interpretation as nmod
might work, but this is not possible for the kind of usage in the first example.
There is indeed a Dutch example of such a usage in the Alpino treebank. Here, is it analysed as obl
:
# source = Treebank/eans/02_02_05_i.xml
# sent_id = eans\02_02_05_i
# text = Wat typt deze machine zwaar!
# auto = ALUD2.8.5
1 Wat wat PRON VNW|excl|pron|stan|vol|3|getal Person=3 2 obl 2:obl _
2 typt typen VERB WW|pv|tgw|met-t Number=Sing|Tense=Pres|VerbForm=Fin 0 root 0:root _
3 deze deze DET VNW|aanw|det|stan|prenom|met-e|rest _ 4 det 4:det _
4 machine machine NOUN N|soort|ev|basis|zijd|stan Gender=Com|Number=Sing 2 nsubj 2:nsubj _
5 zwaar zwaar ADJ ADJ|vrij|basis|zonder Degree=Pos 2 advmod 2:advmod SpaceAfter=No
6 ! ! PUNCT LET _ 2 punct 2:punct _
That is an example sentence from a Dutch reference grammar, Algemene Nederlandse Spraakkunst. See here and here for discussion (in Dutch) of the 'wat voor (een)' cases. Thet authors do consider 'wat voor (een)' to be a (complex) determiner, About the discontinuous cases, they say that the interrogative pronoun, if it is part of the NP, can be separated from the rest of the NP. The rest of the NP can be moved rightward, but the WH-prnoun has to remain in sentence initial position. This would suggest consituent-hood at some level, so maybe the predet solution is a way out.
Using the predet solution would however mean that the wat in Wat myn süsterdochter en mooi brüüdjen is. needs to be analysed differently from comparable cases such as Wat myn süsterdochter fiks is. or Wat myn süsterdochter mooi singen kan. I would prefer to analyse wat in the same way here.
In English, exclamative "what" can mark indefinite nominals whether singular ("what a great book") or plural ("what great books"), so we consider it a predeterminer.
The combination "many a" is arguably a fixed expression (a complex determiner), but in UD we are treating "many" as a predeterminer as well. (Though the traditions for tagging/parsing predeterminers in English are questionable: UniversalDependencies/UD_English-EWT#412)
Here are English translations and glosses of the last three Low Saxon examples. I hope this makes it clear that the construction does not directly correspond to the English one:
A) Wat myn süsterdochter en mooi brüüdjen is! - What a beautiful bride my niece is!
Wat myn süsterdochter en mooi brüüdjen is
what my niece a beautiful bride is
B) Wat myn süsterdochter fiks is! - How smart my niece is!
Wat myn süsterdochter fiks is
what my niece smart is
C) Wat myn süsterdochter mooi singen kan! - How beautifully my niece can sing!
Wat myn süsterdochter mooi singen kan
what my niece beautifully sing can
In all three examples, is it also possible to put the verb in the second position, e.g. Wat is myn süsterdochter fiks!
Example A might be analysed in the same way as English, but the analysis as a predeterminer does not work for B and C. As a Low Saxon speaker, it would feel very unintuitive to me to treat example A differently from example B. Also example C can probably be considered to represent the same type of construction.
Is "myn süsterdocter" the subject? Maybe "wat" is just an adverb in all 3 cases, more like English "how" than "what"?
Yes, "myn süsterdochter" is the subject in all three examples. "Wat" is generally just an ordinary interrogative pronoun like its English cognate "what", but I agree that it does not seem to behave like an interrogative pronoun here.
I am not sure that a single analysis for sentence-initial 'wat' can work.
There seems to be a concensus in the literature that 'wat een X' is a phrase, with a complex determiner of some kind. Arguments for this analysis, I think, are:
'wat een knappe kinderen zijn jullie' what a smart kids you are
As Dutch is a pretty strict V2 language, the fact that 'wat een N' precedes the verb 'zijn' is a strong argument for constituenthood, thus arguing against an analysis where 'wat' would be a adverb that is not part of the NP. Second, in split cases like
wat zijn jullie een knappe kinderen what are you a smart kids
there is the peculiarity that the determiner 'een', which normally can only occur with singular count nouns, precedes a plural noun. One explanation is to assume that we are dealing with the complex determiner 'wat een', despite the fact that it is discontinuous.
A det:predet analysis for 'wat' therefore seems best to me.
Cases where 'wat' occurs sentence initially but not in combination with 'een + N', as in the example from Alpino you mention, must be analyzed differently, ie with 'wat' being an obl.
It would be interesting to know to what extent `traditional sentence' types are represented in UD and how they are annotated, i.e., statement, question, command and exclamation.
wat zijn jullie een knappe kinderen
what are you a smart kids
It would seem that this is an exclamation, and I would think that perhaps such a sentence-initial marker wat might work as an ADV or PART with an advmod connection to the predicate. This, of course, may not be intuitive if there seems to be a direct affinity to the phrase adjacent to the sentence-initial wat.
As an exclamation marker, it would be analogous to the question marker kas in Estonian, that is called ADV with advmod connecting to the clause head.
The fact that sentence initial 'wat' triggers an exclamative interpretation is what the examples have in common, and indeed it is this effect that could make you think there is a single phenomenon at work here.
Sentence types are of course not annotated in UD, but there has been a bit of discussion about this recently, ie adding them might make the corpora more useful for typology. A recent Dagstuhl seminar discussed this issue in some detail, (see WG 9, pg 57, for an overview of proposed types)
The "fixed" analysis doesn't seem quite right to me, simply because it doesn't seem to be a fixed expression, but rather a semi-productive construction (in the construction grammar sense) with very restricted applicability. This is assuming that all the different examples discussed are instances of the same use of "wat". UD doesn't have a good mechanism for representing these "almost fixed" expression, and I think many treebanks over-extend the "fixed" relation as the closest approximation. I know this is the case for our Swedish treebank (although not for this specific type of example as far as I can remember).
In Low Saxon, as well as in Dutch, the construction corresponding to English "what a (
ADJ
)NOUN
" in expressions of surprise can be interrupted by other elements.In the English GUM treebank, they call this a
det:predet
, a relation that is not documented here and only listed in the language-specific guidelines.In the Dutch Alpino treebank, however, the interrogative(?) pronoun wat and the indefinite article een are treated as
fixed
even when they are separated by other elements:A construction that can be separated by several words belonging to different syntactic phrases does not sound very fixed to me. Neither does a
det(:predet)
relation seem intuitive if the supposed predeterminer can be placed several syntactic phrases apart from the noun it determines. Should I nevertheless follow the English or the Dutch analysis in my Low Saxon treebank or would a different analysis be more appropriate?