Open nesor opened 3 years ago
I am not sure what should be the correct annotation in Czech nor English. Just a few comments:
Hi Martin,
Thanks for your feedback! I know it's not one of the most common structures, but the annotation of the noun following než as advcl
struck me as inappropriate. It was a colleague of mine, interested in comparing finite and non-finite structures across several languages and using UDPipe to parse her texts, who noticed that the NOUN
following než is an advcl
. It's actually not easy at all to specify rules distinguishing finite and non-finite clauses (or their heads) in UD, and some of the rules must be language specific. Cases like this one make it even harder.
Best,
Saša (Rosen)
The conversion from PDT to UD seems to be buggy in this respect: if the phrase introduced by než is a simple nominal, it shouldn't be treated as a clause (i.e., než should be attached as case
, and the nominal should be obl
rather than advcl
). If it is not a nominal, it should be treated as an adverbial clause, even if reduced. But in the converted data, it seems to be treated as advcl
(or as dep
) even for nominals.
The single occurrence of nic než “nothing but” could be regarded as an ellipsis of the more frequent nic jiného než “nothing else than”, which is one of the documented comparative structures. So I do not think we need a coordination analysis here.
I came across the Czech equivalent of English nothing but ... as in He inherited nothing but debts. The single example in Czech PDT has než as
deprel=mark
&upos=SCONJ
, while the followingNOUN
isdeprel=advcl
. This is how UDPipe parses similar Czech sentences.UD 2.3 - Czech PDT: https://lindat.mff.cuni.cz/services/kontext/view?ctxattrs=word%2Cdeprel&attr_vmode=mixed&pagesize=40&refs=%3Ddata.id&q=~shO32i5rtQLq&viewmode=kwic&attrs=word%2Cdeprel&corpname=ud_23_cs_pdt_a&attr_allpos=all
Is this intended? I'm not happy mainly about the
NOUN
ending up asdeprel=advcl
. There's nothing clausal about it, is there? There are more English examples with nothing but in one of the treebanks, and they are consistently annotated in a more intuitive way: but isdeprel=cc
&upos=ADP
while the followingNOUN
isdeprel=conj
:UD 2.3 - English EWT: https://lindat.mff.cuni.cz/services/kontext/view?ctxattrs=word&attr_vmode=mouseover&pagesize=40&refs=%3Ddata.id&q=~HPQO0Pl1HrNb&viewmode=kwic&attrs=word&corpname=ud_23_en_ewt_a&attr_allpos=all
I'd be happy just to learn that the Czech structure should be annotated more like its English equivalent.
Thanks!