Open shirawigi opened 2 years ago
איפשהו, איכשהו, לאנשהו, מתישהו are ADV, and as I just answered in #55, it is actually possible to give them PronType
For כלשהו איזשהו, I agree that DET makes sense, in which case they can get the typical indefinite PronType; however they are not supposed to get Definite, it seems, which surprised me too:
https://github.com/UniversalDependencies/UD_English-EWT/issues/291
You can also enrich כל with PronType=Tot:
https://universaldependencies.org/u/feat/PronType.html#Tot
And שום, אף can get PronType=Neg:
https://universaldependencies.org/u/feat/PronType.html#neg-negative-pronoun-determiner-or-adverb
I think we haven't done this so far, but it should be easy to add.
For כלשהו איזשהו, I agree that DET makes sense, in which case they can get the typical indefinite PronType; however they are not supposed to get Definite, it seems, which surprised me too: UniversalDependencies/UD_English-EWT#291
@amir-zeldes Thanks! Just to make sure before we update our list - I understand from the attached discussion that "איזה" (in the sense of "ראיתי איזה ילד") also get indefinite PronType, like איזשהו, and not Definite=Ind?
Yes, that seems to be the reading of the guidelines as endorsed by Dan, so I'm OK with implementing it based on that. It's not really a huge theoretical statement whether we are looking at an 'article' or a 'determiner pronoun' and I guess the desire is to keep the inventory of articles small. I interpret איזה as falling in the same bin as English "some", so PronType=Ind but no Definite, which is reserved for the small class of articles.
@amir-zeldes
Should we use the features PronType=Tot
for כל and PronType=Neg
for שום, אף from now on?
For sentences like כולם נסעו לטיול שנתי, do we segment כולם to
DET
+PRON
, DET
with PronType=Tot
and PRON
with PronType=Prs
?
Yes, I think so - that would also be consistent with original HTB too:
(except for the part where it used to be tagged NOUN)
@amir-zeldes
In some of the sentences from the original HTB, we have כולו following a NOUN
, that we have been actually
tagging as an unsegmented PRON
receiving det
.
העניין כולו פשוט יחסית.
Here we have det
(inyan, kulo); det
(inyan, ha), and kulo is tagged PRON
with the following features:
Definite=Def, Gender=Masc, Number=Sing, Person=3, PronType=Prs
In the following sentence,
כל העניין פשוט יחסית.
we have det
(inyan, kol); det
(inyan, ha), and the suggestion is now to have the following features for DET
כל:
Definite=Cons, PronType=Tot
Do you think that כולו meaning the entire/whole [X] should be segmented as well?
There's another placement of כולו after certain verbs, which I'm not sure about its treatment, or the element on which it depends:
המעירים כנראה מצפים שבלוג שמתיימר להיות מוקדש כולו לא סתם ללשון, אלא לתיקון של מילון, יהיה בעצמו תקין לשונית.
There is also כל which doesn't mean all but rather any in the following (usually negative) construction,
and I wonder whether the feature PronType=Tot
is fitting here:
לממשלה אין כל כוונה להאריך את תוקף התו הירוק.
There are several questions here:
det
then I think it makes sense not to segment it. If we segment it, we need to come up with a different syntactic analysis (in the first example, maybe appos
"the matter, all of it", but it is not reversible like a normal apposition). Personally I'm willing to leave it as is and consider it a determiner, then it behaves like "kol hainyan"advmod
(as if it means "entirely")Tot
is still correct and better than the alternative Neg
(for items like German kein, Polish żaden 'not-one')the 'floating quantifier' use (with the verb) is known from English and other languages as well, in such cases it is tagged ADV and attached as advmod (as if it means "entirely")
Can we unsegement adverbial כולו and assign to it Person, Gender and Number? (Are ADVs permitted with these features?) If they do - I think maybe it's better to unsegment another inflected adverbs discussed in https://github.com/IAHLT/UD_Hebrew/issues/32 - לבדו, לאיטו, עודו, דַּיוֹ. Right now they take nmod:poss
- even though usually ADVs take obl
- I think treating them as unsegmented inflected adverbs could solve this issue.
It's not out of the question to treat it as an adverbial, but I think then the morphological features wouldn't work - the documentation says Number and Gender apply to "pronouns, adjectives, determiners, numerals, verbs", and Person only to a subset of those. I suppose the way to make it adverbial, unsegmented and keep the FEATS is to use obl:npmod
...
Hi @amir-zeldes,
We were wondering how we should annotate the following lemmas: כלשהו איזשהו איפשהו, איכשהו, לאנשהו, מתישהו
What should be there UPOS? Are כלשהו and איזשהו DET? If so, do they get the feature Art=Ind (like איזה when it functions as an indefinite article)? And the others are ADV? Do they get to be marked as indefinite somehow, maybe by the feature PronType=Ind?
For reference, here's our new list of determiners: ה (PronType=Art, Definite=Def) מדי (מִדֵי) כל/כול אף שום איזה (PronType=Int), איזו, אילו (כתיב לא תקני: אלו) איזה/איזשהו (PronType=Art, Definite=Ind), איזושהי עוד יותר פחות קצת המון (בהחלפה של הרבה; במשמעות התקהלותית, יתויג כ-NOUN) הרבה די (דֵי) מספיק כמה
Thanks, Shira