IAHLT / UD_Hebrew

Hebrew Universal Dependencies Treebank
Other
2 stars 2 forks source link

Summary of our Meeting – Removal of Vnoun feature + Code Switching analysis #21

Open Hilla-Merhav opened 2 years ago

Hilla-Merhav commented 2 years ago

@amir-zeldes I am summarizing the conclusions we reached in our discussion, please tell me if I misunderstood any point:

  1. Removal of VerbForm=Vnoun and analysis of every צורת מקור as NOUNs

Some of the reasons we are going with NOUN are: smixut pruda (היותה של המדינה מדינה יהודית), צורת מקור in somex position (ימי קום המדינה) and the varied ADPs they govern (לאחר שובה לישראל, לפני רדתי מהמטוס, עד שובו לתפקיד)

I am trying now to alayze different kinds of challenging sentences to make sure we know how to deal with all of them, please tell me if there is anything we should analyze differently.

In הבר נפתח ברדת הלילה obl (niftax, redet) case (redet, be) compound (redet, layla)

In ברדתו מן המטוס הוא לחץ את ידי obl (laxats, redt) case (redt, be) nmod:poss (redt, o) nmod (redt, matos) case (matos, min)

In מדינה שלמה התרגשה בראותה את ירדן ג'רבי obl (hitragsha, reot) case (reot, be) nmod:poss (reot, a) nmod (reot, yarden) case (yarden, et) [Case=Acc]

In הוא הבעיר מדורה באומרו דברים אלה obl (hev'ir, omr) case (omr, be) nmod:poss (omr, o) nmod:npmod (omr, dvarim) (and for now: dep(omr, dvarim))

In הוא הבעיר מדורה באומרו כי נגיף הקורונה אינו טבעי obl (hev'ir, omr) case (omr, be) nmod:poss (omr, o) acl (omr, tiv'i)

In בהינתן שזה ימשיך להיות קצב התחלואה, המצב רע obl (ra, hinaten) case (hinaten, be) acl (hinaten, yamshix)

  1. Code switching - POS=X and deprel=flat

    עברנו תהליך דיו דילג'ינס מעמיק compound(tahalix, due) flat(due, diligence)

due – POS=X diligence – POS=X

amir-zeldes commented 2 years ago

👍 Exactly!

Hilla-Merhav commented 2 years ago

@amir-zeldes Great, thank you! :)

I am trying to create a query so we can fix the analysis in HTB (we have a list now of things we intend to fix in HTB) – and we also want to fix it in our new data during QA (I mean the data we analyzed before of the invention of the feature VerbForm=Vnoun - thanks to this feature, the newest occurrences are easy to find and fix). I am not sure how should I build this query – how did you find the occurrences of צורות מקור in HTB? Do you recommend a particular query?

amir-zeldes commented 2 years ago

That's great - in terms of a query, it might not be exhaustive, but I think you can get most of these by looking for a VERB serving as advcl which has a dependent b-, like this:

tok ->dep[func=‎"advcl‎"] pos=‎"VERB‎" ->dep ‎"ב‎" https://corpling.uis.georgetown.edu/annis/#_q=dG9rIC0-ZGVwW2Z1bmM9ImFkdmNsIl0gcG9zPSJWRVJCIiAtPmRlcCAi15Ei&_c=SUFITFRfSFRC&cl=5&cr=5&s=0&l=25

Hilla-Merhav commented 2 years ago

Thank you very much!! :)