IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

Beinoni Paul #11

Open NathanD38 opened 3 years ago

NathanD38 commented 3 years ago

@amir-zeldes

I would like to know what is the correct upos and features of beinoni paul. Should it have the upos ADJ with the features Gender and Number, or the upos VERBwith the strange tagging of HebBinyan=PAAL and Voice=Pass?

In somce cases it is tagged AUX before an infinitive (צפוי, עשוי, עלול, ראוי, etc.)

I think that if we want to tag it as VERB, then we should add HebBinyan=PAUL, because the combination of HebBinyan=PAAL and Voice=Pass is peculiar.

Consequently, should the tagging be consistent throughout the various constructions or varied based on individual cases? (e.g., בנוי, כרוך, מצוי)

Let's take חשוב which here seems to be a standard ADJ, declineable for Number and Gender: ההנחיות החשובות סייעו לעובדים בשמירה על בטיחותם.

And compare it to the indeclineable form, whose upos is not clear to me:

הצלחה בתחום הזה תשפיע משמעותית על אימוץ הטכנולוגיה הזו בקרב כל משתמשי הדרך, ולשם כך חשוב להיות גם יצירתיים, גם יסודיים בבחינת הפתרונות בשטח וגם גמישים במתן הפתרונות המתאימים.​

If the upos is ADJ, and we have csubj(xashuv, lihyot), is it ok to have VerbType=Mod on an ADJ in this case? If the upos is VERB, technically it should be csubj:pass(xashuv, lihyot), since it comes from HebBinyan=PAAL and Voice=Pass.

A clearer guidline would help in the current batch and in the QA process.

Thanks! Netanel

amir-zeldes commented 3 years ago

Hi @NathanD38 - these are important questions and there are several here, so let me clarify:

Now about the binyan:

As a result of the 'axioms' above, forms like "katuv" cannot be considered a separate binyan (or at least traditionally they are not). The reason for placing them with PAAL is that historically in Semitic, each binyan has associated participles, and PAUL is the passive participle corresponding to the PAAL form (so, katav <> katuv, and active kotev). In Hebrew, the tensed passive equivalent of PAAL (Arabic fu'ila) was lost, and we mostly get nif'al (like ratsax<>nirtsax; incidentally, this replacement has happened in colloquial Arabic as well). But still the participle PUAL is counted as binyan PAAL (the same holds for Arabic maf'uul, e.g. maktuub).

Finally regarding feats, xashuv is not a modal (it's an evaluative, sure, but so is "tov"), so it cannot have VerbType=Mod, and in fact if we think it's not a VERB in general, it should just be ADJ, and not receive Voice at all. But katuv should be VERB, Voice=Pass, HebBinyan=PAAL, VerbForm=Part whenever it is not a lexicalized adjective.

Hope that makes sense!

shirawigi commented 3 years ago

So, if I understand correctly, beinoni paul (such as katuv) is never an ADJ? What about when it follows an ADP, like in the following sentence: במקביל, אושר בחוק לבנקים לקבל דמי הפצה מהקרנות והקופות, כנהוג במכירת ניירות ערך לציבור. If it is a verb, can it receive case from the ADP? Or we should assign it upos=ADJ in cases like this?

NathanD38 commented 3 years ago

@amir-zeldes Thank you for the detailed answer!

Is there a list we can access or compile ourselves of lexicalized adjectives (not just from beinoni paul)? Do you consider other beinoni paul forms as lexicalized adjectives, apart from "xashuv"?

To understand the features given in each case:

the modal AUX עשוי, עלול, צפוי, אמור in the below example "hu tsafuy/amur/alul/asuy lalexet", will get the following features: Person=3, Number=Sing, Gender=Masc, VerbType=Mod

the VERB in the impersonal construction below, "tsafuy/amur/alul/asuy laredet geshem", will get the following: Person=3, Number=Sing, Gender=Masc, Tense=Pres, VerbForm=Part, HebBinyan=PAAL, Voice=Pass

the ADJ in the following lexicalized beinoni paul, "xashuv she-moshe yagi'a la-pgisha", will get the following: Gender=Masc, Number=Sing

What @shirawigi showed above is the tendency of certain forms of beinoni paul to appear after an explicit ה, which may be an SCONJ+mark or DET+det, deprending on the upos (a tricky decision in an of itself); or implicit ה within ADP like כ in כנהוג, כאמור, כצפוי, etc. or ב in באמור לעיל.

I see in the current HTB, that כאמור is analyzed as one token, with upos ADV, and receives advmod. This is perhaps expected because of its tendency to appear by itself, as a paranthetical, referring to a previous sentence. paragraph, notion or idea.

But when it comes with complements, such as the following example, how should we analyze it, or indeed, any other such form? גלישה או שימוש בשירותי האתר מהווה הסכמה לאמור בהסכם זה "glisha o shimush be-sherutey ha-atar mehava haskama la-amur be-heskem ze."

amir-zeldes commented 3 years ago

@shirawigi :

beinoni paul (such as katuv) is never an ADJ

Not necessarily all, but for katuv it's hard to imagine. It is VERB 6/6 times in HTB.

If it is a verb, can it receive case from the ADP?

No, then it would be advcl, with SCONJ+mark. In this case it also makes sense, since you can insert a "by" phrase (ka-nahug 'al yedey kulam)

amir-zeldes commented 3 years ago

Is there a list we can access or compile ourselves of lexicalized adjectives

I don't have one handy, but maybe a search in HTB could help. I think you should mainly test it linguistically - if something is a passive verb form, it should have a relationship to the active. Something "katuv" has been written by someone, as is related to active ("katuv al yedey", "katvu oto"). The same is not true for "xashuv" (*xashuv al yedey, ??xashvu oto)

The feats you have look mostly right, but I don't think alul is passive at all, and arguably asuy and amur aren't really either, since they do not correspond to actives/can't accept agents, etc. In HTB they do not have voice at all, but anyway they are always tagged AUX... the "rain" example is a rare kind.

Your last example I think is a nominalization (=that which is said), so it should not be a VERB there, but that's an exception.

NathanD38 commented 3 years ago

Your last example I think is a nominalization (=that which is said), so it should not be a VERB there, but that's an exception.

So what is the upos and deprel in that example? I understand it is nominalization, but I'm not sure if you mean it should be ADJ.

In similar instances with ב/לכל הקשור ל (=In all that which is related to), HTB has it as VERB with Voice=Act. And in some of these, the deprel is dep(kashur, kol), mark(kashur, ha), case(kashur, be), in others, dep(kashur, ha) and det(kashur, kol).

I do not understand completely the distinction between beinoni paul and poel, if in some cases, the paul is not even passive. If, by now, they are no longer considered passives, then it seems that they've become lexicalized adjectives, or on their way to that function.

The agent test or by-phrase doesn't always result in a clear cut decision. To me, I cannot entirely say "hu kashur al yedey moshe" and be happy with it, but I'm fairly happy with "hu nikshar al yedey moshe ve-axshav hu kashur". xatum/katuv/etc. al-yedey X is really preculiar to me, and I would expect past-tense NIFAL in those instances, with the beinoni paul signifying a present result.

amir-zeldes commented 3 years ago

So what is the upos and deprel in that example?

NOUN and nmod, I think the guidelines already say that explicitly about nominalized participles

For kashur I would actually consider it totally lexicalized, like xashuv, since it does not correspond to "koshrim oto" (at least for me), let alone a by phrase. This feeling is validated by similar forms in other words, for example English "related to" is treated as ADJ with lemma "related":

http://match.grew.fr/?corpus=UD_English-EWT@2.8&custom=6143acaf5f1fe&eud=yes

However "kashur be-xevel" would be VERB and Voice=Pass according to current HTB practices, and I think it is OK (it is not some lexicalized meaning but a totally transparent passive participle of "likshor").