IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

VerbType Feature #44

Open NathanD38 opened 2 years ago

NathanD38 commented 2 years ago

@amir-zeldes

The current validator returns the following error when assigning (or leaving as is from the automatic parser) VerbType=Mod to upos=VERB:

Feature VerbType is not permitted with UPOS VERB in language [he].

This is quite baffling, since we have used this feature from the beginning for Modal verbs governing an infinitive via deprel csubj, such as אפשר, יש/אין, ניתן, to name a few,

This leads us to possibly a larger question: the necessity of the VerbType feature altogether. In the UD site, the move to V2 had this under language-specific features:

VerbType=Aux|Mod|Cop|Main; currently used in Hebrew, Dutch and Latin; it has to be seen how much such a feature will be demanded if we remove the AUX tag.

English does not use this feature at all; Classical Latin does have a combination of VerbType=Mod with upos=VERB, whereas Italian have these as AUX with no such feature. Classical Chinese of Kyoto has only VerbType=Cop with upos=AUX.

In the Google Docs version of our guidelines, I have followed the tagging of our own corpus and the HTB. We clearly have VerbType=Cop for upos=AUX for lemma היה and its derivations. This is perhaps superfluous, as the deprel cop already indicates the copula-function of this token.

We also have VerbType=Mod for upos=VERB. The example in the feature table is tagged VERB based on a conversation we had via email back then. If this particular token, בטוח, should be an ADJ, then it would obviously not receive VerbType feature. We cannot say the same for the example with אפשר which is always tagged VERB with the feature VerbType=Mod, though both בטוח and אפשר govern a csubj.

חלק גדול מהסצנות מרגישות לא מלוטשות, הניראות הכללית של המשחק מרגישה מיושנת, ולא בטוח שגם שדרוג החומרה שלכם יצליח לגרום לעובדה הזאת להשתנות.

אי אפשר להפריד בין כושר גופני ליכולת ריצה.

The general rational was that these tokens do not have any person or number. They are frozen forms, though some of them are indeed derived from their fully inflected counterparts. The Hebrew Academy terms them חג"מ - חסרי גוף ומספר. Either they do not fit the requirements of being tagged VERB or ADJ (or even NOUN ***) since they lack any inflection, or they do, but are assigned a feature differentiating them from their fully-inflected counterparts.

If there's a 1:1 relation between the deprel csubj and modality, then we can perhaps do away with this feature.

Then again, יכול can only be AUX in its entire inflection (יכל/יכול/היה יכול/יוכל) and it does receive VerbType=Mod. The token צריך can either be AUX with inflection (צריך/צריכה/צריכים/צריכות) or VERB+csubj (frozen form צריך), and in both cases it receives this feature. The token אפשר from PIEL, is a non-modal verb which erroneously receives this feature, possibly because we don't use diacritics and the parser can't tell the difference.

*** In a recent Heb Academy meeting, we heard an example of a NOUN governing a csubj:

תענוג לצפות בסדרה החדשה הזו.

Hilla-Merhav commented 2 years ago

@amir-zeldes

I'll just add that the fact the validator returns this error could be not more than a technical issue – we are testing some new validation rules Nick implemented, and some permitted morphological features combinations may have been missed, so it can be fixed if needed.

The larger question @NathanD38 raised is interesting. Usually, I think we use VerbType=Mod with VERBs like ניתן לעשות or יש להגיע בזמן.

We can’t say there is 1:1 relation between csubj and modality – many nonmodal VERBs govern through csubj or usually csubj:pass: נטען ש... דוּוח ש... מסופר ש... התברר כי... נקבע ש... נאמר ש... סוּכם ש... נכתב ש...

I don’t see them nor the modal ניתן ש... as frozen forms, so I don't think the inflection issue is the motivation to assign VerbForm=Mod to ניתן. (I assume all of them are masculine singular because the subject position is taken by a clause, and masculine singular is the unmarked form.)

But if there is any other motivation to use VerbType=Mod, I think we miss some (semantically) modal structures of PRON or ADV that govern through csubj -

PRON – עליו להגיע בזמן (https://github.com/IAHLT/UD_Hebrew/issues/10) csubj(v, lehagia) ADV – אל לך לאחר (https://github.com/IAHLT/UD_Hebrew/issues/36) csubj(al, leaxer) כנראה שהוא יאחר

(Right now) we don't assign VerbType=Mod to PRONs or ADVs, so we don't account for the modality of these expressions. There are also some modal participles I guess we should see as lexicalized adjective (?) – בטוח, אסור, מותר so they also lack the VerbType=Mod feature.

Is it a required feature at all? if it is, should it be premitted with nonverbal elements?

amir-zeldes commented 2 years ago

The answer has several parts: