Open Hilla-Merhav opened 3 years ago
@amir-zeldes I have the same question about בעינה, בעינו etc בעיה אחת נותרה בעינה והיא הפקת פניצילין Like לבדו, לבדה etc – I found that it's unsegmented on HTB (7/7 occurrences), the lemma is בעין but it doesn't have any morphological feature. Do you think we should assign Gender-Number-Person to "בעין" inflections (so we say ADV can get these features), or would you prefer us to segment it?
The first one (לבדו etc) happens exactly the same in UD Arabic and Coptic, and is segmented in both (نفس+suffix in Arabic). When it has a preposition (بنفسه), as obl/nmod, and otherwise as obl/nmod without case in Arabic and as obl:npmod in Coptic. I really don't know why this is unsegmented, putting Person on ADV seems a bit strange. Honestly I'd segment both of these, though we'd have to revise it in HTB too.
@amir-zeldes
So בעינה is easy, I guess it should be segmented to three tokens with a NOUN head:
בעיה אחת נותרה בעינה –
obl
(notra, ein)
case
(ein, be)
nmod:poss
(ein, a)
But לבד+ו seems a bit trickier to me. Since לבד is an ADV, through which deprel should it govern the PRON?
Yeah, I guess originally even לבד was two things (with a noun בד), right?
Actually this makes me think we might be better off doing both as ADPs... But if you want to emulate the Coptic/Arabic solution it would be obl:npmod with a possessor (nmod:poss), even for לבד+ו. I feel either of those is better than fixed here, since it's totally transparent.
@amir-zeldes
Actually I am very much in favor of the syntactical analysis of ב+עינ+ה.
If we do לבד as ADP, how should we treat לבד when it comes alone? (asking it in English is a missing opportunity!) רק אני נשארתי בבית לבד הלכתי לסרט לבד
I checked Even Shoshan and it turned out the origin of ל+בד is the same בד of בד בבד we discussed on issue #14 – בד = חלק, מנה. In בד ב+בד we decided to analyze compositionally.
Maybe for לבד we can do the same thing you suggest for למחרת - advmod
and unsegment it when it comes alone, and segment ל+בד+ו when it takes a PRON as an nmod:poss
?
We can also segment both of them if it's preferable, and connect ל+בד as obl
. What do you think?
I checked Even Shoshan and it turned out the origin of ל+בד is the same בד of בד בבד we discussed on issue #14 – בד = חלק, מנה. In בד ב+בד we decided to analyze compositionally. Maybe for לבד we can do the same thing you suggest for למחרת -
advmod
and unsegment it when it comes alone, and segment ל+בד+ו when it takes a PRON as annmod:poss
?
@amir-zeldes
Also, I think this is not the only case we may be forced to segment a token that otherwise is unsegmented. In one of our team meeting we talked about the challenge of ADPs that comes from nominal origins – they hide NOUNs that sometimes can takes nmod:poss
:
באמצעותה של השירה אנחנו מבטאים שמחה ועצב הוא מזהה חשש בקרבם של בכירי ארגון הטרור בטהובן הולך בעקבותיו של מוצרט כשהוא ניגש לסוגיית האנסמבל בפידליו נתפלל למענו של הנער שנפצע הלילה בפיגוע
In these particular environments I guess the only option we have is segmentation of these ADPs (?), but if we segment every ADP that can theoretically behave like that, we lose a long list of ADPs.
So maybe we can treat לבד the same way we treat (?) these ADPs: segment only when we have no choice?
I think segmenting the ל in לבד even in לבדו will come across as very odd to most contemporary speakers of Hebrew. I would go with either ADV+advmod and still stick a possessive on it (turns out the validator actually allows this), or ADV+obl:npmod, saying something like "this is a lexical adverb, converted and wrapped in a possessed NP, then used adverbially again". I know both of these are convoluted, but segmenting a noun בד here seems so etymologizing and arcane to me, that it's worse than segmenting a+lone (which is at least somewhat transparent, and still not done)
@amir-zeldes
Oh, OK! if we are allowed to analyze לבדו as ADV+PRON, that indeed sounds ideal! :)
Do you prefer obl:npmod
or the possessive deprel? If we choose possessive, is it nmod:poss
or an new obl:poss
?
And we still stick to the syntactical analysis of ב+עינ+ה, right?
Yes, I was relieved the validator allows this, so let's go with that. About nmod:poss - can you check if the validator would tolerate that, with the parent being ADV? If so, I think the possessive can be nmod:poss regardless of the semantics (I think it's a bit like "by his lonesome" would also be nmod:poss, even though one doesn't possess one's lonesomeness). Otherwise I would avoid introducing a new label just for this, so probably we'd have to either use obl:npmod, or tag לבד as a NOUN to be compatible with the possession, and then attach the whole thing as obl:npmod (and the pronoun as nmod:poss)
@amir-zeldes nmod:poss
works :)
If we choose possessive, is it nmod:poss or an new obl:poss?
About nmod:poss - can you check if the validator would tolerate that, with the parent being ADV? If so, I think the possessive can be nmod:poss regardless of the semantics (I think it's a bit like "by his lonesome" would also be nmod:poss, even though one doesn't possess one's lonesomeness). Otherwise I would avoid introducing a new label just for this
@amir-zeldes nmod:poss works :)
We decided to annotate לבדי, לבדו as an ADV that governs a nmod:poss
(a one phenomenon did not justify introducing a new obl:poss
label). Lately @IsraelLand and I encountered quite a few adjectives that govern possessives:
בני יחידי
אבי חורגי
In order to avoid the annotation of an ADJ being parent of nmod
(nmod:poss
(yexid, i)), do you think we should reconsider an obl:poss
label? This label might be useful also with inflected adverbs:
לבדי
לאיטי
בכל מאודי
האם החוזה עודו תקף?
יעיל דיו
These still seem pretty rare, so I'm not sure I would advocate adding obl:poss
(though if we ever changed our minds, we could auto-add it by changing all ADV governing nmod:poss
automatically)
For some of these though you could also treat these adjectives as nominalizations, in which case possessing an adjective becomes normal under the usual guidelines, so I think יחידי doesn't pose any special problems (though it is a cool example!)
The alternative for the adverbs is to treat them as nouns whenever they are possessed and deprel obl:npmod
to their governing verbs. But especially given that עוד in the sense "still" is a very prototypical adverb, I would keep tagging it ADV (at least that is my gut feeling, if someone wants to argue in favor of NOUN just for some cases, it's not out of the question as another option)
BTW I think מאוד in מאודי is a NOUN which is just a homonym/homograph of the adverb מאוד, at least synchronically for me.
For some of these though you could also treat these adjectives as nominalizations, in which case possessing an adjective becomes normal under the usual guidelines, so I think יחידי doesn't pose any special problems
Do we want tag יחיד NOUN, or keep it ADJ that govern an nmod:poss
(and then in case ever changed our minds we can relocate these adjectives)?
The universal guidelines say:
adjectives that exceptionally head a nominal phrase (as in the sick, the healthy) are still tagged ADJ https://universaldependencies.org/u/pos/ADJ.html
So I think it's meant to stay tagged as ADJ
@amir-zeldes
For the first time I have "לבדו" in my data, (הסמכות לביטולו של חוק תיוחד לבית המשפט לבדו) and I check on HTB how it is analyzed. I found that it's unsegmented also when it inflects by person, gender and number (on HTB we have the forms= לבדו, לבדה, לבדם), the lemma is לבד but it doesn't have any morphological feature. Usually, we assign to ADVs only Prefix=Yes or Polarity=Pos/Neg. Do you think we should assign Gender-Number-Person to "לבד" inflections (לבדי, לבדךָ, לבדֵּךְ, לבדו, לבדה, לבדן...), or would you prefer us to segment it?