IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

Polarity of copulas #48

Open Hilla-Merhav opened 2 years ago

Hilla-Merhav commented 2 years ago

@amir-zeldes On HTB certain copulas always get the Polarity=Pos feature, even when the clause is negative: pattern { GOV -[cop]-> N1;N1 [upos="PRON"] ;N2 [lemma="לא"] ; N1 < N2}

image

The query pattern { GOV -[cop]-> N2;N1 [lemma="לא"] ;N2 [upos="AUX", Polarity=Pos] ; N1 < N2} retrieves the same results as pattern { GOV -[cop]-> N2;N1 [lemma="לא"] ;N2 [upos="AUX"] ; N1 < N2}

image

And some copulas never get the Polarity feature, even in positive enviroments: image

  1. Given that the negativity of the clause is sitting out of the copula, do you think copulas should get the Polarity feature at all? I suggest we treat all copular lemmas in the same way - זה, הוא etc.

  2. The exceptional is אינו, איננו - in the Guidelines some uses of inflected forms of אין are described as pure copulas (תחנת הקלפי אינה מקום ההצבעה ההכרחי היחיד - example from the morphological features table) but on HTB these forms are connected only as aux (even when the predicate is nonverbal). Following the guidelines, I think sometimes אינו, איננו is tagged in our data as cop (when the predicate is nonverbal). Do you think we should distinguish copular and noncopular אינו, as we distinguish copular and noncopular haya (זה לא היה טוב VS הם היו נפגשים באופן קבוע)? If we do, I believe copular אינו should get Polarity=Neg, but I think this is might be the only case of polarity sitting in the copula. What do you think?

amir-zeldes commented 2 years ago

Basically, yes to everything.

Regular copulas like "haya" don't have a negative counterpart, and non-paired items shouldn't get polarity. The polarity on "lo" is not copular, it's just the negative polarity of a negation item.

Eineno etc. have a positive counterpart (yeshno), and I think they should both get polarity, both as copula and as existential. Note that as existentials, they should be tagged VERB, since they are no longer an auxiliary/copular function word, but the local clause root.

So I think bottom line:

IsraelLand commented 2 years ago

Sorry to revive this, @amir-zeldes It's nothing new, but I've recently encountered cases like - תמונה

תמונה

which I wonder whether should get the copula treatment. Consider that in both sentences and in the four eino cases -

  1. In the positive counterparts, מי שמעוניין או זכאי would most probably be used. Meaning, no copula needed
  2. eino is just the Standard Hebrew negative adverbial for the present/beinoni, both could have worked as מי שלא מעוניין או לא זכאי לקצבה just as easily in colloquial Hebrew. Sure, this has to do with the adj/verb conundrum, but even the obvious adj - he's not young should be הוא אינו צעיר , equivalent to הוא לא צעיר.

I really do see them as adverbial, not as "copular declaratives". I'm not even sure why, but I suspect this has to do with them appearing in acl:relcl and perhaps the peculiarity of meunyan.

My point being, I can't decide this is copular 100%. In the same vein we have stuff like - תמונה

which I think is a copula rather than an nsubj, and both are acceptable - we can't REALLY decide. I think these are cases where plurality of annotations, which really captures the language, wins over uniformity which is artificial here, imo. I can easily divorce the copular meaning of this eino, in fact I don't really see it. Someone else would see them as inseparable. I don't think this is problematic at all, just like we use both both cop/nsubj quite freely.

Thank you!

amir-zeldes commented 2 years ago

I see that this is complex, and these are certainly interesting examples to contrast, but I think you can't discuss copula eino without discussing its positive counterpart "hino". I don't see "hino" as an ADV, and it seems like a fairly prototypical copula in sentences like:

There is really no adverbial meaning here, and its main function is a 'defining' kind of copula. It's also part of a system of tense oppositions in which, as in Russian and to some extent in Arabic, non-present nominal predication requires an inflected auxiliary (so "hino" vs. "haya").

If we want to treat the positive cases as copulas, I think it forces us to recognize that the negative items serve too functions: an adverbial, negating function, and a copular one. Most inherently negative 'verb-like' elements in UD languages are typically attached as verbs/auxiliaries first, and their negating nature is then expressed using a Polarity feature, if at all.

The example with "ein lesarev" is something a bit different, and I think we should be consistent across חגמ analyses here and treat it like "asur", but that's a different story which we should keep separate IMO.

IsraelLand commented 2 years ago

The way I see it, the main thing is these are two variations of Hebrew imo. In Standard Hebrew eino is the negative of any such "present tense" form, with or without copular meaning (e.g. "hu eino ochel"). While hino is just not used as a copula, only for emphasis. In colloquial Hebrew, eino is mostly a copula, I'd say (as otherwise the speaker would opt for "lo"), while hino is a prevalent copula.

In the example above ("mi shelo meunyan bekitzbat nechut"), it seems more of the first kind of Hebrew, as otherwise -

  1. why not use lo? "mi shelo meunyan"
  2. I don't think the positive counterpart would be "mi shehino meunyan", which is super awkward, unless used to emphasize the copular meaning
  3. Thus it seems more of a kind of "we have to use eino for negation in Standard Hebrew", not copula.

So we obviously can't in any way differentiate between the two variations, that'd be artificial and wrong, but I do suggest more freedom in the matter, like we have with "isha shehi beherayon" - which we can and do analyze as both nsubj or cop., depending on the sentence, annotator discretion and other constraints (a woman that is pregnant vs. a woman that she is pregnant (anaphora))

amir-zeldes commented 2 years ago

I agree that "isha shehi beherayon" is not totally clear, but we do need to rule on advmod vs cop for "eino" and I would say:

Does that make sense? Whatever we decide, I would like to find a consistent guideline for these items.

IsraelLand commented 2 years ago

Yeah, if we need to decide I think that works - I'm still not sure about treating it as cop. by default, but if we have to decide, that's as sensible a decision for a general form to default to as any, would that mean it would stay in the realm of a general guideline and not "hardcoded" validation, to account for the "hu eino haya babayit" outlier? (otherwise, a more intricate validation rule, which I think would be clunky and hard to implement at best) Thanks

strasss commented 2 years ago

I'm a little confused... can ADV take gender and number features? Or do you mean AUX governed by advmod? Is that something we do? Would you oppose it staying AUX+cop when governed by a nominal or adjectival predicate and AUX+aux when governed by a verb? I'm pretty sure that's what we've all been doing so far.

As for the "eino haya bababyit" cases, I think it could also be AUX+aux even when it is governed by the "bayit" - consider we already have a aux+cop setup in things like "asuy lihiyot babayit" "alul lihyot ayef" - where the "asuy" is aux and "lihiyot" is cop. I think the validator allows a NOUN or ADJ to have an aux child when they already have a cop child - and if it doesn't allow it yet - it certainly could and I believe it should :)

NathanD38 commented 2 years ago

@IsraelLand @amir-zeldes I wondered what the correct analysis is of the following, in light of the suggestions here:

אנשים שהינם מועסקים בחברות כוח אדם אנשים שאינם מועסקים בחברות כוח אדם אנשים ש/המועסקים בחברות כוח אדם אנשים שלא מועסקים בחברות כוח אדם

Let's assume that the regular forms (without אינם and הינם) are more prevalent in the data. Let's assume further that מועסקים here is a VERB in beinoni form, i.e. AUX+aux (not ADJ, in which case we would treat אינם and הינם as AUX+cop).

Up until now, we've used the following features for הינם in this construction:

Gender=Masc, Number=Plur, Person=3, Polarity=Pos, lemma=הינו

And the following for אינם in the same construction:

Gender=Masc, Number=Plur, Person=3, Polarity=Neg, lemma=אינו

The suggestion now is to treat אינם in this construction as ADV+advmod with Polarity=Neg. But how would we specify its features? It inflects for Gender, Number, and Person, features which are not valid with ADV in UD. e.g.,

אדם שאינו מועסק/אישה שאינה מועסקת/נשים שאינן מועסקות

And, if we go this far, then הינו is originally an inflection of the ADV הינה (with emphasized meaning, though it has probably lost its origins in Modern Hebrew). Then it seems reasonable to treat הינו as the positive counterpart to אינו also in this adverbial usage, i.e., it would be ADV+advmod with Polarity=Pos, but again, without means to legally specify its inflections.

In copular constructions we've been using the following features:

אנשים שהינם עובדי חברות כוח אדם

Gender=Masc, Number=Plur, Person=3, Polarity=Pos, VerbType=Cop, lemma=הינו

אנשים שאינם עובדי חברות כוח אדם

Gender=Masc, Number=Plur, Person=3, Polarity=Neg, VerbType=Cop, lemma=אינו

אנשים שהם עובדי חברות כוח אדם

I guess here is where people would find the sentence ambiguous between cop and nsubj, but the tag would be PRON in any case. I gravitate towards a subject relative clause, i.e., cop (People who *they are Manpower companies' employees.)

Now, here's where things get tricky.

אדם שהיה יכול להיות מועסק בחברת כוח אדם אדם שאינו יכול להיות מועסק בחברת כוח אדם אדם ש(לא) יכול להיות מועסק בחברת כוח אדם

Since we treat יכול** as a modal AUX, and היה and להיות are also tagged AUX, I wonder which one of the היה forms is copular. Are both cop or the first היה is aux? In English, you use aux:pass for להיות (been) in this construction and היה יכול is simply the sequence of AUX could have. In Hebrew we use cop for להיות.

But aux:pass seems out of place in copular constructions, and if we use cop for להיות in Hebrew, it appears היה, אינו and, if we fancy some emphasis, הינו, would also be cop:

אדם שהיה יכול להיות עובד חברת כוח אדם אדם ש(הינו/)אינו יכול להיות עובד חברת כוח אדם אדם ש(לא) יכול להיות עובד חברת כוח אדם

Verbal constructions call for aux, copular ones call for cop, while complex modal constructions require two cop or two aux (היה + להיות), but I think it's preferable to avoid a mishmash of cop and aux here.

** I've used יכול here but all inflections are possible with the modal AUX tag, e.g.,

נשים שהיו יכולות להיות מועסקות / גברים שהיו יכולים להיות מועסקים נשים שהיו יכולות להיות עובדות כוח אדם / גברים שהיו יכולים להיות עובדי כוח אדם

[The variant יכלו also appear in our data]

נשים שיוכלו להיות מועסקות / גברים שיוכלו להיות מועסקים נשים שיוכלו להיות עובדות כוח אדם / גברים שיוכלו להיות עובדי כוח אדם

[The verbose variant יהיו יכולים/יכולות also appear in our data]

*** I haven't noticed @strasss reply before posting mine. My apologies for any repetition on my part.

IsraelLand commented 2 years ago

@strasss Not sure if @amir-zeldes is proposing to put gender, number, etc. on the (when they are - ) ADVs. "AUX+cop when governed by a nominal or adjectival predicate and AUX+aux when governed by a verb" is what we've been doing all along, but I suggested a slightly different take on the rigidity of the aux+cop cases, to account for "eino haya babayit".

btw, on second thought "eino haya babayit" is really ungrammatical (doesn't mean we can't encounter it in hypercorrective usage), but it can occur in other cases, maybe "haish hu babayit" - "haish hu eino babayit". As for the validation rules, these are the 2 current ones - NOUN or ADJ token 13 has non-cop AUX children (at least token 12 but no cop AUX היה Non-VERB non-AUX (token 13) can govern through aux (child token 12) only if it also governs through cop; add a cop child to token 13?

So you're right, but it's more restricted rn. I see that it is allowed with the uber-specific "haya" (was it like that a few days ago?), but I really find it too specific, it doesn't account for cases like "haish hu eino babayit", for example.

@NathanD38 Thank you, these are good examples. IMHO staying on the current AUX+aux for verbs is a good idea, but that's just how I see it.

amir-zeldes commented 2 years ago

a general guideline and not "hardcoded" validation

Yeah, this is not something the validator would check, as long as copulas are listed in the copula list for the language (that is checked, at the lemma level)

can ADV take gender and number features? Or do you mean AUX governed by advmod? Is that something we do?

It is a little unusual, I admit, but it can be allowed by the validator if enabled here:

https://quest.ms.mff.cuni.cz/udvalidator/cgi-bin/unidep/langspec/specify_feature.pl?lcode=he&feature=Gender

For example, Italian uses this for floating quantifiers (e.g. "tutti" in the sense "they all do it" or something is Masc Plur and tagged ADV). But AUX cannot be governed by advmod - the only choices are an ADV with agreement FEATS or AUX (in which case we should treat all of them as copular and just add Polarity)

the validator allows a NOUN or ADJ to have an aux child when they already have a cop child

Yes, it does

מועסקים here is a VERB in beinoni form

Yes, that would be canonical and note it can take a passive agent "al yedey"

haish hu eino babayit

Sure, and I'm mainly concerned about what we will find in the data, not about what is normatively grammatical; but notice there is an options to analyze this particular one with dislocated if you really don't like the 'double copula' reading.

Basically we need to decide how much we like having the ADV option - if we like it very much, we should enable gender features for ADV, like some languages have done.

strasss commented 2 years ago

Well, I like the idea of enabling gender and number with ADVs for a more elegant solution for things like לבדו, לבדם, לבדי which, rather awkwardly in my opinion, we currently analyze as an ADV +pronominal suffix nmod:poss. The same might also work for כולו - as in אכלתי את הסנדוויץ' כולו.

But I don't see why we would have to do the same with אינו - which is perfectly fine being analyzed as AUX+cop and AUX+aux depending on the setup. Its interchangeability with לא can also be easily explained by the fact that copulas in the present tense are optional, and it is also optional as an auxiliary - and of course its complementary distribution with לא is because of both items' negative polarity - I don't see why we would have to say that the interchangeability is due to the two items' occupying the same position, and I think it would be too costly in terms of IAA to leave it to the annotator to decide.

IsraelLand commented 2 years ago

I see why introducing gender etc, would be problematic, that said it's true it would provide a better solution for לבדו and the like. Depends on the "cost", I guess.

It all comes down to how you read it imho, if indeed eino implies absolute copularity, as opposed to lo, would you read the polar opposite of מי שאינו מעוניין as - מי שהוא מעוניין בקצבת נכות or even the negative equivalent - מי שהוא לא מעוניין בקצבת נכות I wouldn't. If it's 100% copular in the negative, why not in the positive?

I'm not saying the above aren't possible, just that the exact meaning is at the annotator's discretion, I'm not sure I buy the 1-1 copula-negative eino idea as a whole. If we assume hino-eino to be polar copulas, then הוא יפה - הוא לא יפה - הוא אינו יפה would render הוא יפה - הוא כן יפה - ?הוא הינו יפה If both are 100% polar copulas, and copular structures are possible but not needed, why would one be awkward in the positive? I can only be sure of the negativity, not of the copularity, if the polar opposite is this awkward in a purely copula manner.

Back to the original example, מי שהינו מעוניין בקצבת נכות does seem to work fine, as an assertive copular, I'm just not sure I can determine with certainty that is the intended polar meaning of מי שאינו מעוניין, rather than just a simple negative in a standard, prescriptive Hebrew setting, without intended copula.

amir-zeldes commented 2 years ago

why would one be awkward in the positive

Languages are weird in terms of usage, and symmetry across negation does not have to be assumed. For an example of the positive double construction, see this from haaretz:

https://www.haaretz.co.il/literature/study/.premium-1.9700748

BTW I think it's because of an implied contrast reading (him and not someone else)

I think the syntactically 'ambiguous' cases are maybe doing both functions at once, collapsed into one word. "ken" is not a copula either way IMO, I would say it's an adverb (which it is historically too), and analyze it like Eng. "indeed". Notice in the past you can get "hu ken haya yafe", which neatly shows it is not doing the job of the copula.

For the edge cases I would like to have a clear guideline, just for consistency. The above is just a suggestion, if you prefer a different (but predictable) breakdown, that would be fine of course.

NathanD38 commented 2 years ago

@amir-zeldes

Thank you for the example from haaretz. It is probably for emphasis and implied contrast as you said, but I suspect that it is quite rare outside of that function.

I would like to ask for a clear guideline for the below cases:

העובדים (כן) צריכים להגיש את המסמכים הבאים העובדים לא צריכים להגיש את המסמכים הבאים העובדים אינם צריכים להגיש את המסמכים הבאים ??העובדים הינם צריכים להגיש את המסמכים הבאים

העובדים היו צריכים להגיש את המסמכים הבאים העובדים יהיו צריכים להגיש את המסמכים הבאים

We tag צריך in this construction as a modal AUX+aux from להגיש. The features here are: Gender=Masc, Number=Plur, Person=3, VerbType=Mod.

I agree that we will not treat כן/לא here as anything but ADV+advmod.

Do you treat אינם, and, if such a sentence is grammatical, הינם, as AUX+aux or AUX+cop? It is confusing because I'm not sure what the correct analysis is of the "simple" sentences with היו and יהיו. Are they AUX+cop or AUX+aux?

What about היה + modal + להיות + VERB or non-VERB? What is the role of היה and להיות in the verbal vs. nominal predicate constructions, in terms of upos+deprel?

העובדים היו אמורים להיות מוזמנים לוועדה ?העובדים יהיו אמורים להיות מוזמנים לוועדה

העובדים היו אמורים להיות במקום העבודה ??העובדים יהיו אמורים להיות במקום העבודה

Here too we have the variants with אינם/לא, and maybe one can find some rare examples with הינם:

העובדים לא אמורים להיות מוזמנים לוועדה העובדים אינם אמורים להיות מוזמנים לוועדה

העובדים לא אמורים להיות בבית בשעות העבודה העובדים אינם אמורים להיות בבית בשעות העבודה

IsraelLand commented 2 years ago

Nice example. I think this might be to "formal up" the language (ho, ken...)? Or yeah, emphasis.

Anyway, I totally see what you're saying about the ambiguous eino doing both functions at once, it's just that I'm not sure I can determine it for sure. Only the negation part of eino is sure, for me. I'm not saying ken is a copula of course, but only an adverb, the same way I see eino.

hu eino ochel, hu eino yafe, hu eino chefetz - the usage is consistent all around, the negator of the present beinoni (It's just not always observed). It's not a copula in the verb case, doubtful in the noun case (hu lo chefetz is completely interchangeable).

But I see that as eino is a definite copula in some cases, there'll always be some divide (aux vs. cop / adv vs. cop), So I get that if we do have to reach consistency about it, rather than annotator discretion, we can either leave it at the current MO or adopt a new one where we apply gender, etc. to ADV (which can help us with לבדו, but might be too costly). So I'm more for discretion and some freedom in the matter, like in the dislocated vs. cop cases, but yeah...

Thanks @NathanD38 for laying out all the cases. It gets all the more complicated with those modal "verbs" Hebrew employs... This is one more case where I really don't think we can be sure it's "they had to hand in (then)" or "should have handed in".

amir-zeldes commented 2 years ago

We agree about the ken/lo ADV cases, so that's a good start already. For "haya" supplying the tense to a VERB, I think it's just plain aux, so I think we agree there too.

For the rest, I agree there's a spectrum of more/less copula-y, but basically I would like to make it depend on a formal part of the annotation scheme, like POS. So we could say, if the predicate is a NOUN, then "hino/eino" are always cop (with the necessary Polarity, and by validator constraint, also AUX), but if the predicate is a VERB then it is AUX/aux, with the understanding that it supplies both tense (present) and the polarity. This ensures that eino/hino are treated the same (since hino is quite weird as ADV with a VERB, right?) If this is the guideline then:

I anticipate some discomfort saying that "einam" is different from "lo", since they are paradigmatically interchangeable in examples 2-3, but keep in mind that "lo" is compatible with a tensed copula (lo hayu tsrixim), but "einam" isn't (*einam hayu tsrixim), suggesting that at least for a VERB, which provides its own "nexus" (in Jespersenian terms), this item can provide a tense. For the nominal case, we choose cop because there is no other exponent of the nexus (i.e. the predication).

This is just a suggestion though - if you want to propose a different guideline that's fine, but I'd ideally like it to be expressed using POS or something else already in the system.

strasss commented 2 years ago

Here are the annotations I propose for Nethanel’s examples, which I think are borne out of the general guidelines, namely that an AUX can’t have children unless it’s the head of the clause, so the relevant clause head – whatever its POS - governs any “extra” aux’s in their appropriate deprel:

העובדים (כן) צריכים להגיש את המסמכים הבאים העובדים לא צריכים להגיש את המסמכים הבאים aux(lehagish, zrixim)

העובדים אינם צריכים להגיש את המסמכים הבאים ??העובדים הינם צריכים להגיש את המסמכים הבאים aux(lehagish, zrixim) aux(lehagish, einam) העובדים היו צריכים להגיש את המסמכים הבאים העובדים יהיו צריכים להגיש את המסמכים הבאים aux(lehagish, zrixim) aux(lehagish, hayu/yihyu)

העובדים היו אמורים להיות מוזמנים לוועדה ?העובדים יהיו אמורים להיות מוזמנים לוועדה cop(muzmanim, lihiyot) aux(muzmanim, amurim) aux(muzmanin, hayu, yihyu) העובדים היו אמורים להיות במקום העבודה ??העובדים יהיו אמורים להיות במקום העבודה cop(mekom, lihyot) aux(mekom, amurim) aux(mekom, hayu/yihyu)

העובדים אינם אמורים להיות מוזמנים לוועדה cop(muzmanim, lihiyot) aux(muzmanim, amurim) aux(muzmanin, einam)

As for the supposedly ungrammatical, yet pretty common structure in Israeli B.A. students’ essay writing, so I hear -

הוא אינו היה בבית

I propose to treat it similarly to the above examples – let’s imagine eino is sort of modifying the copula haya, but haya, childless by UD rules, hands its guardianship over to the clause head – bayit:

cop(bayit, haya) aux(bayit, eino)

And as for -

דני הוא אינו בבית

Consider the contrast:

דני הוא אינו בבית - a bit awkward דני אינו הוא בבית - unacceptable

I think that’s a good argument for a dislocated analysis rather than a double copular one, because it seems to show that "hu" was never a copula here to begin with. So the annotation here would be:

דני הוא אינו בבית Dislocated(bayit, dani) nsubj(bayit, hu) cop(bayit, eino)

NathanD38 commented 2 years ago

@strasss Thank you for your detailed analysis! I'm not sure I understand why להיות, tagged AUX, is treated differently than היה/יהיה/אינם/הינם before a VERB. Why can't it receive aux as well before VERB?

@amir-zeldes Thank you! What I've tried to do in my posts is make sure that the various cases are detailed and demonstrated, so it can direct us towards avoiding inconsistencies.

Following your suggestions:

If we have אינו/הינו before nominal predicates (NOUN, ADJ), the upos is AUX, the deprel cop, and the features are as follows: Gender=Masc, Number=Sing, Person=3, Polarity=Neg, VerbType=Cop, lemma=אינו Gender=Masc, Number=Sing, Person=3, Polarity=Pos, VerbType=Cop, lemma=הינו

If we have אינו/הינו before VERB and AUX (modals), the upos is AUX, the deprel aux, and the features are as follows: Gender=Masc, Number=Sing, Person=3, Polarity=Neg, Tense=Pres, lemma=אינו Gender=Masc, Number=Sing, Person=3, Polarity=Pos, Tense=Pres, lemma=הינו

If we have להיות before nominal predicates (NOUN, ADJ), the upos is AUX, the deprel cop, and the features are as follows: HebBinyan=PAAL, VerbForm=Inf, VerbType=Cop, lemma=היה

If we have להיות before VERB, we still treat it as a copula, and I would like to understand why it cannot have the deprel aux with the following features: HebBinyan=PAAL, VerbForm=Inf, lemma=היה

strasss commented 2 years ago

@NathanD38 I’m sorry - My bad! - in my mind I treated muzmanim as ADJ and it probably should be a passive VERB - So yes, of course - lihiyot would be cop when governed by an adj or noun and aux when governed by a verb.

NathanD38 commented 2 years ago

@strasss Thank you! I've added ADJ to my post. The thing is, I merely describe what we have in the data until now. I don't think I've seen a sentence with an aux-receiving להיות, even when followed by a VERB.

strasss commented 2 years ago

@NathanD38 That's interesting. I would assume that it's a bias the automatic parser has because of some obscure HTB practice and which we never really challenged further along the line. I mean, until you pointed it out I never noticed it, and I realize now even I myself might have had the same bias :) so I just never fixed those things in my annotation and I guess maybe the rest of us too. Unless I'm missing something and all those beinoni forms should always be tagged ADJ in those settings, @amir-zeldes? I mean in things like אמורים להיות מוזמנים, עשויים להיות מורדמים?

amir-zeldes commented 2 years ago

Unless I'm missing something and all those beinoni forms should always be tagged ADJ in those settings, @amir-zeldes? I mean in things like אמורים להיות מוזמנים, עשויים להיות מורדמים?

No, I think a by-agent is possible here, and while the tests are not 100%, if a by-agent is possible, I think we've treated that as a sufficient condition for tagging as VERB (murdamim al yedey ha-rofe ha-mardim)

+1 for this POS tag based guideline, it seems workable and quite close to what intuitively makes sense (even if corner cases will pop up, which always happens)

IsraelLand commented 2 years ago

Yeah, I do think the modality spectrum would pose issues, אמורים צריכים זקוקים זכאים etc., when combined with prog. vs. irrealis vs. regular-copular readings, can force us to "lock in" some POS for them, even if we're unsure of it, to prevent conflictions with the system. But yeah if we strive for more uniformity that's the best rn, I guess, despite the corner cases

amir-zeldes commented 2 years ago

That's true - but what I always say about these things is "a consistent but flawed annotation scheme is better than an idealistic but inconsistent one". As long as the limitations are documented and well understood, at least you can be sure that you are finding all instances of something, and the last step of distinguishing two murky subtypes is up to you.