UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
200 stars 42 forks source link

"you know" and "I mean" - parataxis or discourse? #117

Open amir-zeldes opened 3 years ago

amir-zeldes commented 3 years ago

Filler-uses of "you know" and "I mean" appear with two labels in EWT:

It's 9:0 in favor parataxis with "you know", and 4:5 in favor of discourse for "I mean". I think we should pick one and consolidate (GUM has parataxis ATM, but I'm fine doing either).

nschneid commented 3 years ago

Why are your intuitions different between the two?

My gut instinct would be discourse for both, but that may be because of their function rather than their syntax.

nschneid commented 3 years ago

An argument for discourse is that these subject+transitive verb expressions wouldn't really be complete as independent sentences. They are more like modifiers of what comes after.

amir-zeldes commented 3 years ago

I have no different intuitions, I think they should be the same - I just don't feel passionately about which one it should be. And we can also add prefixed "see" to that, in: "See, I don't think that's true" (root should be "think", IMO)

TBH I feel like both parataxis and discourse would be doing double duty if we choose them for the label here: parataxis is already used somewhat confusingly for both 'implicit coordination' (two sentences standing next to each other without "and") and for parentheticals. discourse is used for phatic language, but also non-lexical sounds ("uh"), swearwords, yes/no answers accompanying a sentence, etc.

The advantage of parataxis is that it's more similar to these cases in that the head can bring arguments with it (even if the object is missing), so this is a little less jarring than saying "sometimes 'discourse' can be a whole predication". The advantage of discourse is that functionally these cases are similar, in that they don't contribute at issue semantic content. So I guess syntactically these things look more like parataxis, but semantically they're more like discourse.

nschneid commented 3 years ago

Is discourse used for other things that have internal structure? If not I see the logic for parataxis.

sylvainkahane commented 3 years ago

In spoken corpora we have a lot of such constructions that we decided to annotate discourse even if the guidelines indicated that all verbal constructions without a marker should be parataxis. We distinguish them from inserted clause (parataxis:insert) and parenthetical clause (parataxis:parenthetical). http://match.grew.fr/?corpus=UD_French-Spoken@2.7&custom=5fedca5b56659&clustering=e.label Differences between the three relations are explained here: https://surfacesyntacticud.github.io/guidelines/u/oral_language/parataxis_insert/ https://surfacesyntacticud.github.io/guidelines/u/oral_language/parataxis_parenth/ https://surfacesyntacticud.github.io/guidelines/u/oral_language/discourse/

amir-zeldes commented 3 years ago

I actually like this a lot, but don't have the personnel to introduce this into GUM, sadly... Let along the other English corpora :(