IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

Adding three sections to the guidelines #49

Open Hilla-Merhav opened 2 years ago

Hilla-Merhav commented 2 years ago

Hi @amir-zeldes

In order to avoid the parser's tendency to tag xcomp as obj, --- and csubj or csubj:pass as ccomp, and in order to align the voice issue, I suggest adding three sections to the guidelines under the "confusing cases" section:

  1. obj or xcomp?
  2. ccomp or csubj[:pass]?
  3. Voice – Act or Mid?

I kindly ask for your approval before adding them to the guidelines. In case you think that the examples are suitable, I will create proper trees that could be of help.


obj or xcomp?

Only transitive verbs can govern accusative objects (or forms that get the explicit definite accusative object because of colloquial-origin language changes – יש לי את ה..., בא לי את ה...). Middle and passive verbs don't govern objects: the middle is a reflexive category with the mention of only one argument (which is a syntactic subject – it comes with no explicit syntactic object). In a passive clause what is usually expressed by the object (or sometimes another argument) is now expressed by the subject.

The best way to check if a token is governed through obj is to apply the את ה... test: השאלה נשארה פתוחה *השאלה נשארה את הפתוחה

If an element can’t be modified by את ה... and there is no possibility to rephrase the sentence to make it work, this is not an obj. In this particular case the deprel is xcomp(nishara, ptuxa).

It should be noted that there are also active intransitive verbs that can fail this test: העבודה ארכה חודשים *העבודה ארכה את החודשים xcomp(arxa, xodashim)

Some dictionaries, like Even Shoshan, indicate if verbs are transitive or intransitive. They could be used as another auxiliary tool (ארך defined intransitive).


ccomp or csubj[:pass]?

The situation now is that the parser shows a strong bias and tends to tag both ccomp and csubj as ccomp, but there is an important syntactic difference between them. While both are often governed by speech or psych verbs which rule content clause arguments, ccomp is usually a clause in the object position, and csubj is actually in the subject position:

Example for verbs that govern obj or ccomp:

הוא אמר [את דברו] – obj(amar, dvar) הוא אמר [שמחירי הדלק עלו שוב] – ccomp(amar, alu)

טענתי [טענות דומות] - obj(taanti, teanot) טענתי [שרוב הלקוחות שלי מרוצים] - ccomp(taanti, merutsim)

היא ציינה [זאת] - obj(tsiena, zot) היא ציינה [שגישתו של שינדלר לבטהובן הייתה רומנטית מדי] - ccomp(tsiena, romantit)

Example for verbs that govern nsubj or csubj or csubj:pass:

[קדיש] נאמר לרוב על ידי שליח הציבור– nsubj:pass(neemar, kadish) עוד נאמר [כי מחירי הדלק עלו שוב] - csubj:pass(neemar, alu)

[טענות דומות] נטענו בעבר - nsubj:pass(nitanu, teanot) נטען [שרוב לקוחות החברה מרוצים] - csubj:pass(nitaan, merutsim)

בחוזה צוין [איסור על בעלי חיים] - nsubj:pass(tsuyan, isur) לעיתים קרובות צוין [שגישתו של שינדלר לבטהובן הייתה רומנטית מדי] - cusbj:pass(tsuyan, romantit)

התבררה [התמונה המלאה] - nsubj(hitbarera, tmuna) התברר [שהמצב לא טוב] - csubj(hitbarer, tov)

This table illustrates the relationship between ccomp governors and csubj[:pass] governors: ccomp governor csubj or csubj:pass governor
אמר ש... נאמר ש...
כתב ש... נכתב ש...
טען ש... נטען ש...
דיווח ש... דוּוח ש...
סיפר ש... מסופר ש...
קבע ש... נקבע ש...
ציין ש... צוין ש...
סיכם ש... סוּכם ש...
ראה ש... נראה ש...
ידע ש..., הודיע ש... נודע ש...
- התברר ש...
- נדמה ש...

Voice – Act or Mid?

Middle voice refers to reflexive actions, but with the mention of only one argument (and no external agent implied). התנועה התעכבה - middle התנועה עוכבה – passive

In our scheme it refers also to reciprocal actions: הם התכתבו – middle הם כתבו אחד לשני – active

Note that the semantic volition and stative verbs are not related to the voice issue. Active verbs can be unintentional in nature: ספג, נפל, נפל, מעד, קיבל
They can also be stative: חי, מת, ישב, עמד, שהה. (In our guidelines, בקצהו עומד מנזר יוחנן במדבר is an example of an active verb.)

As for intransitive forms of HIFIL (הפשיר, האדים), I suggest to use Doron's view from the article תרומתו של הבניין למערכת הפועל (an article recommended by Shira and attached to the guidelines). These forms are considered active since they are not reflexive (the middle voice is marked by distinctive morphological forms, HITPAEL and NIFAL):

image image image

amir-zeldes commented 2 years ago

Sorry this took me a while to go over, this looks great! My only suggestions are some minor edits:

Hilla-Merhav commented 2 years ago

@amir-zeldes

Thanks a lot for you feedback!

With העבודה ארכה חודשים I think I relied too much on the question what is required by the predicate, so I thought about xodashim as a core argument. I still find this issue a bit confusing; in our last meeting we analyzed together: הבית היה עשוי עץ xcomp(asuy, etz) Despite the fact the house itself is not wood. What hints can we use to decide whether a dependent is xcomp or obl:npmod?

It also makes me realize that sections about obl:npmod and nmod:npmod could be of help! I hope I'll gather some examples and write a suggested section for your approval soon :)

Since the example of העבודה ארכה חודשים doesn't work here, I want to give another example for an active verb that govern an xcomp. Can we use this tree instead? הוא מרגיש חולה xcomp(margish, xole)

amir-zeldes commented 2 years ago

הוא מרגיש חולה

Yes, this seems like a prototypical xcomp (he feels + he is sick)

הבית היה עשוי עץ

This is more borderline as a secondary predication; In English it's actually not very strange to say "the house is wood", but in Hebrew this seems wrong, so perhaps obl:npmod is better (essentially an indication of the way it is made, not the result of making it)