Closed ftyers closed 6 years ago
UD guidelines are quite clear in that we do not collapse MWEs into single nodes (although some datasets still do that), so the original Russian annotation of в течение as one node is not possible. Then I believe that mwe
is the most straightforward solution (actually just a different means of saying that it is one unit).
I think both 1) and 2) are possible, and the difference is whether we believe that a particular expression is or is not a frozen multi-word preposition. That is clearly language-dependent but I agree that we should try to arrive at the same page at least within groups of related languages.
Agree with @dan-zeman that under current guidelines, в течение as one node is not possible.
This leaves the choices of having a multiword case marker vs. using an analytic analysis as in English above, where "front" is still the syntactic head. On the one hand, from a more semantic viewpoint, viewing "in front of" as a multi-word preposition makes a lot of sense -- and Sebastian and I have actually done some work to make things like that as an "enhanced" representation for relating language and vision (upcoming LREC 2016 paper). I would be okay with that but it's tricky with mwe's as to where you stop. We tried to avoid making MWEs of things that were basically productive and you can clearly make productive spatial relations (to the left of, to the side of, on the outside of, on the tip of) but it would be reasonable to regard the ones without the in English (in front of, in back of) as mwe.
Need some general policy with an eye towards applications ... I'll assign this one to Slav.
Under the current scheme, we assign "case"/"advmod"/etc. to the first word, and "mwe" to all other words in Russian MWEs. It is a well-known problem where to draw the line between MWEs and non-MWEs. Different Russian resources provide different lists, as most MWEs are regular from the morphosyntactic viewpoint. At present we consider the merge of two lists (based on the RNC http://ruscorpora.ru/obgrams-PR.html and SynTagRus/ETAP) to be the best option.
However it would be nice to compare the lists across (at least) Slavic languages and have a more consistent policy.
Olga
03.04.2016, 23:39, "Christopher Manning" notifications@github.com:
Agree with @dan-zeman that under current guidelines, в течение as one node is not possible.
This leaves the choices of having a multiword case marker vs. using an analytic analysis as in English above, where "front" is still the syntactic head. On the one hand, from a more semantic viewpoint, viewing "in front of" as a multi-word preposition makes a lot of sense -- and Sebastian and I have actually done some work to make things like that as an "enhanced" representation for relating language and vision (upcoming LREC 2016 paper). I would be okay with that but it's tricky with mwe's as to where you stop. We tried to avoid making MWEs of things that were basically productive and you can clearly make productive spatial relations (to the left of, to the side of, on the outside of, on the tip of) but it would be reasonable to regard the ones without the in English (in front of, in back of) as mwe.
Need some general policy with an eye towards applications ... I'll assign this one to Slav.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Olga Lyashevskaya
School of Linguistics, Faculty of Humanities Higher School of Economics, Moscow
In some Russian treebanks, multiword prepositions like "в течение" (during) are tokenised as a single unit:
In different treebanks we find different ways of dealing with this kind of thing:
1) In English, "in front of":
2) In Czech, "v průběhu":
Are there any cross-lingual guidelines for which structure to prefer ? In principle these are the same, with a preposition, followed by some nominal like thing and then a genitive (in English with 'of') afterwards.
Minor note: In the Czech documentation, "in contrast to" is shown marked with
mwe
, but in the English treebank it is annotated with the same scheme as in (2). Perhaps "because of" could be used --- this is actually annotated withmwe
in English --- but then it wouldn't serve the example of "interruptible" multiword expressions.