UniversalDependencies / UD_English-EWT

English data
Creative Commons Attribution Share Alike 4.0 International
199 stars 42 forks source link

"but" should be CCONJ? #427

Closed nschneid closed 10 months ago

nschneid commented 11 months ago

6 instances, some involving idioms

nschneid commented 10 months ago

"I was able to cancel it but/ADV/advmod only after paying a $50 fee": Currently the second clause is advcl.

This seems to fit what CGEL calls the "addition of new element" subtype of an end-attachment coordinate (conjunct). While it uses the coordinating conjunction, is not really coordination because it does not form a constituent with another conjunct.

image image

So I'll leave the structure as is but change "but" to CCONJ/cc. CCONJ is already used for all other "and/but only" instances.

nschneid commented 10 months ago

EWT has 2 tokens of "all but". One of them is compositional (meaning 'all except'), tagged DET+ADP. This is consistent with one of 2 GUM tokens—the second has CCONJ/cc but I think it should be ADP/case.

The noncompositional one is "all but impossible". I think the tags should be DET+ADP as well though it is a fixed expression functioning as advmod. I will add ExtPos=ADV and update the fixed guidelines. (Leaving XPOS as RB+RB: it seems OntoNotes sometimes uses that and sometimes uses DT+CC, which I don't understand. DT+IN would make the most sense to me.)

nschneid commented 10 months ago

but/SCONJ is legitimate I think when it functions like 'except' + clausal complement.

amir-zeldes commented 10 months ago

Agreed, GUM error is fixed