UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

Errors in mwe annotations #13

Open sebschu opened 7 years ago

sebschu commented 7 years ago

(Reported by Bruno Guillaume via email.)

Some mwe relations are strange: The link : http://talc2.loria.fr/grew/?custom=57f65eebaf73e&corpus=UD_English-1.3 give 6 occurrences of the following pattern:

nschneid commented 7 years ago

I notice a couple of these involve typos: "do to" should be "due to", "becuse of" should be "because of". Can the lemmas be edited to reflect the normal spelling?

nschneid commented 2 years ago

With the current version of the corpus there is one fixed expression with an intervening word: "due largely to"

GUM has "due in large part to"

amir-zeldes commented 2 years ago

I don't think that fixed has to always be contiguous - especially in languages that have Wackernagel particles or similar items that can interrupt any phrase, I think fixed expressions should retain their analysis even around those interruptions.

I don't see a real syntactic difference between the dependency structure of "in large part due to X" and "due in large part to X" - so I would like them to have the same graph structure, notwithstanding the word order.

nschneid commented 2 years ago

Agreed that we should not have two different analyses for "due" + "to". But in English, it seems that fixed is intended for completely frozen expressions "that behave like a single function word". So maybe this is evidence against fixed for "due to", and similarly for "owing to", "according to", etc.

One test could be interruptability with "not" and coordination:

Anyway this seems like it calls for further discussion on the docs issue tracker or in a meeting.

amir-zeldes commented 2 years ago

OK, if we're talking in general about whether "due to" should be fixed then that's another discussion of course, and I'm happy to talk about it next time; but in general I suspect many of the current fixed list are not 100% unmodifiable, so if it's a question of quantities, then "due to" is certainly overwhelmingly more common in its uninterrupted form.

I guess I've just come to accept the current English fixed list as a fait accompli in the interest of stability, but it's not something I feel has deep theoretical motivation either.

nschneid commented 11 months ago

This has become more urgent as the validator now prohibits fixed with intervening words.

I wonder if we should establish a policy that simply excludes from the fixed list all ADJ+ADP combinations acting somewhat like adpositions. That would mean removing "due to" and "prior to", which are currently in the list, and would resolve the question of whether "other than" should be added (#431). This is assuming we can come up with an acceptable non-fixed analysis of the fronted ones ("Due largely to the hot weather, people are staying indoors").

nschneid commented 11 months ago

This has become more urgent as the validator now prohibits fixed with intervening words.

Never mind, I misunderstood the validator output—it is a warning, not an error. There is no firm prohibition on intervening words in a fixed expression.