UniversalDependencies / UD_Japanese-GSD

Japanese data from the Google UDT 2.0.
Other
36 stars 11 forks source link

small mistake in CoNLL-U file #7

Closed keggsmurph21 closed 6 years ago

keggsmurph21 commented 6 years ago

in ja_gsd-ud-new_train.conllu sentence sent_id = train-s14, tokens 13 through 16 all had their head as 13, which is impossible (tokens cannot be their own head). it seems like this 13 was meant to be 12 (the root), at least for token 13.

i don't speak japanese, so i can't tell what the other heads should be, but my parser was throwing errors when trying to set a token as its own head

kanayamah commented 6 years ago

@keggsmurph21 , thank you so much for testing and fixing the corpus. Indeed it was a big mistake! Separately I have worked on corpus update, and this issue has been already fixed.

kanayamah commented 6 years ago

ja_gsd-ud-new_train.conllu has been renamed to ja_gsd-ud-train.conllu' as a v2.3 candidate.testanddev`` as well.