IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

Indefinite expressions #56

Open shirawigi opened 2 years ago

shirawigi commented 2 years ago

Hi @amir-zeldes,

We were wondering how we should annotate the following lemmas: כלשהו איזשהו איפשהו, איכשהו, לאנשהו, מתישהו

What should be there UPOS? Are כלשהו and איזשהו DET? If so, do they get the feature Art=Ind (like איזה when it functions as an indefinite article)? And the others are ADV? Do they get to be marked as indefinite somehow, maybe by the feature PronType=Ind?

For reference, here's our new list of determiners: ה (PronType=Art, Definite=Def) מדי (מִדֵי) כל/כול אף שום איזה (PronType=Int), איזו, אילו (כתיב לא תקני: אלו) איזה/איזשהו (PronType=Art, Definite=Ind), איזושהי עוד יותר פחות קצת המון (בהחלפה של הרבה; במשמעות התקהלותית, יתויג כ-NOUN) הרבה די (דֵי) מספיק כמה

Thanks, Shira

amir-zeldes commented 2 years ago

איפשהו, איכשהו, לאנשהו, מתישהו are ADV, and as I just answered in #55, it is actually possible to give them PronType

For כלשהו איזשהו, I agree that DET makes sense, in which case they can get the typical indefinite PronType; however they are not supposed to get Definite, it seems, which surprised me too:

https://github.com/UniversalDependencies/UD_English-EWT/issues/291

You can also enrich כל with PronType=Tot:

https://universaldependencies.org/u/feat/PronType.html#Tot

And שום, אף can get PronType=Neg:

https://universaldependencies.org/u/feat/PronType.html#neg-negative-pronoun-determiner-or-adverb

I think we haven't done this so far, but it should be easy to add.

Hilla-Merhav commented 2 years ago

For כלשהו איזשהו, I agree that DET makes sense, in which case they can get the typical indefinite PronType; however they are not supposed to get Definite, it seems, which surprised me too: UniversalDependencies/UD_English-EWT#291

@amir-zeldes Thanks! Just to make sure before we update our list - I understand from the attached discussion that "איזה" (in the sense of "ראיתי איזה ילד") also get indefinite PronType, like איזשהו, and not Definite=Ind?

amir-zeldes commented 2 years ago

Yes, that seems to be the reading of the guidelines as endorsed by Dan, so I'm OK with implementing it based on that. It's not really a huge theoretical statement whether we are looking at an 'article' or a 'determiner pronoun' and I guess the desire is to keep the inventory of articles small. I interpret איזה as falling in the same bin as English "some", so PronType=Ind but no Definite, which is reserved for the small class of articles.

NathanD38 commented 2 years ago

@amir-zeldes Should we use the features PronType=Tot for כל and PronType=Neg for שום, אף from now on? For sentences like כולם נסעו לטיול שנתי, do we segment כולם to DET+PRON, DET with PronType=Tot and PRON with PronType=Prs?

amir-zeldes commented 2 years ago

Yes, I think so - that would also be consistent with original HTB too:

https://corpling.uis.georgetown.edu/annis/#_q=Iteb15XXnCIgLiAi15Ui&_c=SUFITFRfSFRC&cl=5&cr=5&s=0&l=10&o=random

(except for the part where it used to be tagged NOUN)

NathanD38 commented 2 years ago

@amir-zeldes In some of the sentences from the original HTB, we have כולו following a NOUN, that we have been actually tagging as an unsegmented PRON receiving det.

העניין כולו פשוט יחסית.

Here we have det(inyan, kulo); det(inyan, ha), and kulo is tagged PRON with the following features: Definite=Def, Gender=Masc, Number=Sing, Person=3, PronType=Prs

In the following sentence,

כל העניין פשוט יחסית.

we have det(inyan, kol); det(inyan, ha), and the suggestion is now to have the following features for DET כל: Definite=Cons, PronType=Tot

Do you think that כולו meaning the entire/whole [X] should be segmented as well?

There's another placement of כולו after certain verbs, which I'm not sure about its treatment, or the element on which it depends:

המעירים כנראה מצפים שבלוג שמתיימר להיות מוקדש כולו לא סתם ללשון, אלא לתיקון של מילון, יהיה בעצמו תקין לשונית.

There is also כל which doesn't mean all but rather any in the following (usually negative) construction, and I wonder whether the feature PronType=Tot is fitting here:

לממשלה אין כל כוונה להאריך את תוקף התו הירוק.

amir-zeldes commented 2 years ago

There are several questions here:

Hilla-Merhav commented 2 years ago

the 'floating quantifier' use (with the verb) is known from English and other languages as well, in such cases it is tagged ADV and attached as advmod (as if it means "entirely")

Can we unsegement adverbial כולו and assign to it Person, Gender and Number? (Are ADVs permitted with these features?) If they do - I think maybe it's better to unsegment another inflected adverbs discussed in https://github.com/IAHLT/UD_Hebrew/issues/32 - לבדו, לאיטו, עודו, דַּיוֹ. Right now they take nmod:poss - even though usually ADVs take obl - I think treating them as unsegmented inflected adverbs could solve this issue.

amir-zeldes commented 2 years ago

It's not out of the question to treat it as an adverbial, but I think then the morphological features wouldn't work - the documentation says Number and Gender apply to "pronouns, adjectives, determiners, numerals, verbs", and Person only to a subset of those. I suppose the way to make it adverbial, unsegmented and keep the FEATS is to use obl:npmod...