Closed keggsmurph21 closed 6 years ago
@keggsmurph21 , thank you so much for testing and fixing the corpus. Indeed it was a big mistake! Separately I have worked on corpus update, and this issue has been already fixed.
ja_gsd-ud-new_train.conllu
has been renamed to ja_gsd-ud-train.conllu' as a v2.3 candidate.
testand
dev`` as well.
in
ja_gsd-ud-new_train.conllu
sentencesent_id = train-s14
, tokens13
through16
all had theirhead
as13
, which is impossible (tokens cannot be their own head). it seems like this13
was meant to be12
(theroot
), at least for token13
.i don't speak japanese, so i can't tell what the other heads should be, but my parser was throwing errors when trying to set a token as its own head