Open kanayamah opened 1 year ago
The UD punctuation guidelines are very clear in this aspect:
A punctuation mark separating coordinated units is attached to the immediately following conjunct.
So the comma must be attached to word 7 (which is the head of the immediately following conjunct), there is no other option.
What do you mean by "learnability of parser"? The UD punctuation guidelines can be converted to an algorithm that fixes the attachment of punctuation. If the training data consistently follows the UD punctuation guidelines (e.g. by fixing it by udapy ud.FixPunct < in.conllu > fixed. conllu
), modern parsers will learn these rules easily. If the test data follows these guidelines as well, there should be no errors in punctuation in the parser output.
I confirm that Udapi's ud.FixPunct
was used when preparing the commit referred above by @kanayamah.
@martinpopel @dan-zeman Thank you for answer. I understand the principles of punctuation regarding coordination.
Particularity in head-final languages, these structures may be counterintuitive since the comma is regarded as a part of preceding word. I think it is related to the coordination orientation discussed in this issue and our paper.
@martinpopel Relationship VERB+고 <-> VERB
can be recognized as advcl
or conj
and the distinction between them is subtle. Suppose there is a verb A and B in word 5 and 10, and a comma is following the verb A. If A and B are in advcl
relationship, the structure is like this,
5 A고 A+고 VERB 10 advcl
6 , , PUNCT 6 punct
...
10 B B VERB 0 root
On the other hand, if they are regarded as coordination, it forms a totally different structure due to the left-head coordination principle:
5 A고 A+고 VERB 0 root
6 , , PUNCT 10 punct
...
10 B B VERB 5 conj
and the recent change (regarding the attachment of punct
-- the head was changed from 5
to 10
) increases the difficulties of comma's attachment prediction. That's why I mentioned the learnability -- which has been already discussed in the paper on coordination in head-final languages.
Relationship
VERB+고 <-> VERB
can be recognized asadvcl
orconj
and the distinction between them is subtle.
This is interesting. Are there grammatical tests that would decide between advcl
and conj
?
The attachment of
,
has been changed in this commit. Some of them are good, but I found many confusions and the learnability of parser is reduced compared to UD v2.10.For example, I think
,
(Word 3) should attached to Word 2 as originally annotated.Many other similar cases found in
train-s37
,train-s38
, etc.