UniversalDependencies / docs

Universal Dependencies online documentation
http://universaldependencies.org/
Apache License 2.0
269 stars 245 forks source link

en: "nothing but N" vs. cs: "nic než N" #799

Open nesor opened 3 years ago

nesor commented 3 years ago

I came across the Czech equivalent of English nothing but ... as in He inherited nothing but debts. The single example in Czech PDT has než as deprel=mark & upos=SCONJ, while the following NOUN is deprel=advcl. This is how UDPipe parses similar Czech sentences.

UD 2.3 - Czech PDT: https://lindat.mff.cuni.cz/services/kontext/view?ctxattrs=word%2Cdeprel&attr_vmode=mixed&pagesize=40&refs=%3Ddata.id&q=~shO32i5rtQLq&viewmode=kwic&attrs=word%2Cdeprel&corpname=ud_23_cs_pdt_a&attr_allpos=all

Is this intended? I'm not happy mainly about the NOUN ending up as deprel=advcl. There's nothing clausal about it, is there? There are more English examples with nothing but in one of the treebanks, and they are consistently annotated in a more intuitive way: but is deprel=cc & upos=ADP while the following NOUN is deprel=conj:

UD 2.3 - English EWT: https://lindat.mff.cuni.cz/services/kontext/view?ctxattrs=word&attr_vmode=mouseover&pagesize=40&refs=%3Ddata.id&q=~HPQO0Pl1HrNb&viewmode=kwic&attrs=word&corpname=ud_23_en_ewt_a&attr_allpos=all

I'd be happy just to learn that the Czech structure should be annotated more like its English equivalent.

Thanks!

martinpopel commented 3 years ago

I am not sure what should be the correct annotation in Czech nor English. Just a few comments:

nesor commented 3 years ago

Hi Martin,

Thanks for your feedback! I know it's not one of the most common structures, but the annotation of the noun following než as advcl struck me as inappropriate. It was a colleague of mine, interested in comparing finite and non-finite structures across several languages and using UDPipe to parse her texts, who noticed that the NOUN following než is an advcl. It's actually not easy at all to specify rules distinguishing finite and non-finite clauses (or their heads) in UD, and some of the rules must be language specific. Cases like this one make it even harder.

Best,

Saša (Rosen)

dan-zeman commented 3 years ago

The conversion from PDT to UD seems to be buggy in this respect: if the phrase introduced by než is a simple nominal, it shouldn't be treated as a clause (i.e., než should be attached as case, and the nominal should be obl rather than advcl). If it is not a nominal, it should be treated as an adverbial clause, even if reduced. But in the converted data, it seems to be treated as advcl (or as dep) even for nominals.

The single occurrence of nic než “nothing but” could be regarded as an ellipsis of the more frequent nic jiného než “nothing else than”, which is one of the documented comparative structures. So I do not think we need a coordination analysis here.

dan-zeman commented 1 year ago