Open NathanD38 opened 2 years ago
@amir-zeldes
I'll just add that the fact the validator returns this error could be not more than a technical issue – we are testing some new validation rules Nick implemented, and some permitted morphological features combinations may have been missed, so it can be fixed if needed.
The larger question @NathanD38 raised is interesting. Usually, I think we use VerbType=Mod with VERBs like ניתן לעשות or יש להגיע בזמן.
We can’t say there is 1:1 relation between csubj
and modality – many nonmodal VERBs govern through csubj
or usually csubj:pass
: נטען ש... דוּוח ש... מסופר ש... התברר כי... נקבע ש... נאמר ש... סוּכם ש... נכתב ש...
I don’t see them nor the modal ניתן ש... as frozen forms, so I don't think the inflection issue is the motivation to assign VerbForm=Mod to ניתן. (I assume all of them are masculine singular because the subject position is taken by a clause, and masculine singular is the unmarked form.)
But if there is any other motivation to use VerbType=Mod, I think we miss some (semantically) modal structures of PRON or ADV that govern through csubj
-
PRON – עליו להגיע בזמן (https://github.com/IAHLT/UD_Hebrew/issues/10) csubj
(v, lehagia)
ADV – אל לך לאחר (https://github.com/IAHLT/UD_Hebrew/issues/36) csubj
(al, leaxer)
כנראה שהוא יאחר
(Right now) we don't assign VerbType=Mod to PRONs or ADVs, so we don't account for the modality of these expressions. There are also some modal participles I guess we should see as lexicalized adjective (?) – בטוח, אסור, מותר so they also lack the VerbType=Mod feature.
Is it a required feature at all? if it is, should it be premitted with nonverbal elements?
The answer has several parts:
@amir-zeldes
The current validator returns the following error when assigning (or leaving as is from the automatic parser)
VerbType=Mod
toupos=VERB
:Feature
VerbType
is not permitted with UPOSVERB
in language [he].This is quite baffling, since we have used this feature from the beginning for Modal verbs governing an infinitive via deprel
csubj
, such asאפשר, יש/אין, ניתן
, to name a few,This leads us to possibly a larger question: the necessity of the
VerbType
feature altogether. In the UD site, the move to V2 had this under language-specific features:English does not use this feature at all; Classical Latin does have a combination of
VerbType=Mod
withupos=VERB
, whereas Italian have these asAUX
with no such feature. Classical Chinese of Kyoto has onlyVerbType=Cop
withupos=AUX
.In the Google Docs version of our guidelines, I have followed the tagging of our own corpus and the HTB. We clearly have
VerbType=Cop
forupos=AUX
for lemmaהיה
and its derivations. This is perhaps superfluous, as the deprelcop
already indicates the copula-function of this token.We also have
VerbType=Mod
forupos=VERB
. The example in the feature table is taggedVERB
based on a conversation we had via email back then. If this particular token,בטוח
, should be anADJ
, then it would obviously not receiveVerbType
feature. We cannot say the same for the example withאפשר
which is always taggedVERB
with the featureVerbType=Mod
, though bothבטוח
andאפשר
govern acsubj
.חלק גדול מהסצנות מרגישות לא מלוטשות, הניראות הכללית של המשחק מרגישה מיושנת, ולא בטוח שגם שדרוג החומרה שלכם יצליח לגרום לעובדה הזאת להשתנות.
אי אפשר להפריד בין כושר גופני ליכולת ריצה.
The general rational was that these tokens do not have any person or number. They are frozen forms, though some of them are indeed derived from their fully inflected counterparts. The Hebrew Academy terms them חג"מ - חסרי גוף ומספר. Either they do not fit the requirements of being tagged
VERB
orADJ
(or evenNOUN
***) since they lack any inflection, or they do, but are assigned a feature differentiating them from their fully-inflected counterparts.If there's a 1:1 relation between the deprel
csubj
and modality, then we can perhaps do away with this feature.Then again, יכול can only be AUX in its entire inflection (יכל/יכול/היה יכול/יוכל) and it does receive VerbType=Mod. The token צריך can either be AUX with inflection (צריך/צריכה/צריכים/צריכות) or VERB+csubj (frozen form צריך), and in both cases it receives this feature. The token אפשר from PIEL, is a non-modal verb which erroneously receives this feature, possibly because we don't use diacritics and the parser can't tell the difference.
*** In a recent Heb Academy meeting, we heard an example of a
NOUN
governing acsubj
:תענוג לצפות בסדרה החדשה הזו.