IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

The ADV "לבד" and its inflections #32

Open Hilla-Merhav opened 3 years ago

Hilla-Merhav commented 3 years ago

@amir-zeldes

For the first time I have "לבדו" in my data, (הסמכות לביטולו של חוק תיוחד לבית המשפט לבדו) and I check on HTB how it is analyzed. I found that it's unsegmented also when it inflects by person, gender and number (on HTB we have the forms= לבדו, לבדה, לבדם), the lemma is לבד but it doesn't have any morphological feature. Usually, we assign to ADVs only Prefix=Yes or Polarity=Pos/Neg. Do you think we should assign Gender-Number-Person to "לבד" inflections (לבדי, לבדךָ, לבדֵּךְ, לבדו, לבדה, לבדן...), or would you prefer us to segment it?

Hilla-Merhav commented 3 years ago

@amir-zeldes I have the same question about בעינה, בעינו etc בעיה אחת נותרה בעינה והיא הפקת פניצילין Like לבדו, לבדה etc – I found that it's unsegmented on HTB (7/7 occurrences), the lemma is בעין but it doesn't have any morphological feature. Do you think we should assign Gender-Number-Person to "בעין" inflections (so we say ADV can get these features), or would you prefer us to segment it?

amir-zeldes commented 3 years ago

The first one (לבדו etc) happens exactly the same in UD Arabic and Coptic, and is segmented in both (نفس+suffix in Arabic). When it has a preposition (بنفسه), as obl/nmod, and otherwise as obl/nmod without case in Arabic and as obl:npmod in Coptic. I really don't know why this is unsegmented, putting Person on ADV seems a bit strange. Honestly I'd segment both of these, though we'd have to revise it in HTB too.

Hilla-Merhav commented 3 years ago

@amir-zeldes So בעינה is easy, I guess it should be segmented to three tokens with a NOUN head: בעיה אחת נותרה בעינה – obl(notra, ein) case(ein, be) nmod:poss(ein, a) But לבד+ו seems a bit trickier to me. Since לבד is an ADV, through which deprel should it govern the PRON?

amir-zeldes commented 3 years ago

Yeah, I guess originally even לבד was two things (with a noun בד), right?

Actually this makes me think we might be better off doing both as ADPs... But if you want to emulate the Coptic/Arabic solution it would be obl:npmod with a possessor (nmod:poss), even for לבד+ו. I feel either of those is better than fixed here, since it's totally transparent.

Hilla-Merhav commented 3 years ago

@amir-zeldes

Actually I am very much in favor of the syntactical analysis of ב+עינ+ה.

If we do לבד as ADP, how should we treat לבד when it comes alone? (asking it in English is a missing opportunity!) רק אני נשארתי בבית לבד הלכתי לסרט לבד

I checked Even Shoshan and it turned out the origin of ל+בד is the same בד of בד בבד we discussed on issue #14 – בד = חלק, מנה. In בד ב+בד we decided to analyze compositionally. Maybe for לבד we can do the same thing you suggest for למחרת - advmod and unsegment it when it comes alone, and segment ל+בד+ו when it takes a PRON as an nmod:poss? We can also segment both of them if it's preferable, and connect ל+בד as obl. What do you think?

Hilla-Merhav commented 3 years ago

I checked Even Shoshan and it turned out the origin of ל+בד is the same בד of בד בבד we discussed on issue #14 – בד = חלק, מנה. In בד ב+בד we decided to analyze compositionally. Maybe for לבד we can do the same thing you suggest for למחרת - advmod and unsegment it when it comes alone, and segment ל+בד+ו when it takes a PRON as an nmod:poss?

@amir-zeldes Also, I think this is not the only case we may be forced to segment a token that otherwise is unsegmented. In one of our team meeting we talked about the challenge of ADPs that comes from nominal origins – they hide NOUNs that sometimes can takes nmod:poss:

באמצעותה של השירה אנחנו מבטאים שמחה ועצב הוא מזהה חשש בקרבם של בכירי ארגון הטרור בטהובן הולך בעקבותיו של מוצרט כשהוא ניגש לסוגיית האנסמבל בפידליו נתפלל למענו של הנער שנפצע הלילה בפיגוע

In these particular environments I guess the only option we have is segmentation of these ADPs (?), but if we segment every ADP that can theoretically behave like that, we lose a long list of ADPs.

So maybe we can treat לבד the same way we treat (?) these ADPs: segment only when we have no choice?

amir-zeldes commented 3 years ago

I think segmenting the ל in לבד even in לבדו will come across as very odd to most contemporary speakers of Hebrew. I would go with either ADV+advmod and still stick a possessive on it (turns out the validator actually allows this), or ADV+obl:npmod, saying something like "this is a lexical adverb, converted and wrapped in a possessed NP, then used adverbially again". I know both of these are convoluted, but segmenting a noun בד here seems so etymologizing and arcane to me, that it's worse than segmenting a+lone (which is at least somewhat transparent, and still not done)

Hilla-Merhav commented 3 years ago

@amir-zeldes

Oh, OK! if we are allowed to analyze לבדו as ADV+PRON, that indeed sounds ideal! :)

Do you prefer obl:npmod or the possessive deprel? If we choose possessive, is it nmod:poss or an new obl:poss?

And we still stick to the syntactical analysis of ב+עינ+ה, right?

amir-zeldes commented 3 years ago

Yes, I was relieved the validator allows this, so let's go with that. About nmod:poss - can you check if the validator would tolerate that, with the parent being ADV? If so, I think the possessive can be nmod:poss regardless of the semantics (I think it's a bit like "by his lonesome" would also be nmod:poss, even though one doesn't possess one's lonesomeness). Otherwise I would avoid introducing a new label just for this, so probably we'd have to either use obl:npmod, or tag לבד as a NOUN to be compatible with the possession, and then attach the whole thing as obl:npmod (and the pronoun as nmod:poss)

Hilla-Merhav commented 3 years ago

@amir-zeldes nmod:poss works :)

Hilla-Merhav commented 2 years ago

If we choose possessive, is it nmod:poss or an new obl:poss?

About nmod:poss - can you check if the validator would tolerate that, with the parent being ADV? If so, I think the possessive can be nmod:poss regardless of the semantics (I think it's a bit like "by his lonesome" would also be nmod:poss, even though one doesn't possess one's lonesomeness). Otherwise I would avoid introducing a new label just for this

@amir-zeldes nmod:poss works :)

We decided to annotate לבדי, לבדו as an ADV that governs a nmod:poss (a one phenomenon did not justify introducing a new obl:poss label). Lately @IsraelLand and I encountered quite a few adjectives that govern possessives: בני יחידי אבי חורגי In order to avoid the annotation of an ADJ being parent of nmod (nmod:poss(yexid, i)), do you think we should reconsider an obl:poss label? This label might be useful also with inflected adverbs: לבדי לאיטי בכל מאודי האם החוזה עודו תקף? יעיל דיו

amir-zeldes commented 2 years ago

These still seem pretty rare, so I'm not sure I would advocate adding obl:poss (though if we ever changed our minds, we could auto-add it by changing all ADV governing nmod:poss automatically)

For some of these though you could also treat these adjectives as nominalizations, in which case possessing an adjective becomes normal under the usual guidelines, so I think יחידי doesn't pose any special problems (though it is a cool example!)

The alternative for the adverbs is to treat them as nouns whenever they are possessed and deprel obl:npmod to their governing verbs. But especially given that עוד in the sense "still" is a very prototypical adverb, I would keep tagging it ADV (at least that is my gut feeling, if someone wants to argue in favor of NOUN just for some cases, it's not out of the question as another option)

BTW I think מאוד in מאודי is a NOUN which is just a homonym/homograph of the adverb מאוד, at least synchronically for me.

Hilla-Merhav commented 2 years ago

For some of these though you could also treat these adjectives as nominalizations, in which case possessing an adjective becomes normal under the usual guidelines, so I think יחידי doesn't pose any special problems

Do we want tag יחיד NOUN, or keep it ADJ that govern an nmod:poss (and then in case ever changed our minds we can relocate these adjectives)?

amir-zeldes commented 2 years ago

The universal guidelines say:

adjectives that exceptionally head a nominal phrase (as in the sick, the healthy) are still tagged ADJ https://universaldependencies.org/u/pos/ADJ.html

So I think it's meant to stay tagged as ADJ