UniversalDependencies / UD_Japanese-GSD

Japanese data from the Google UDT 2.0.
Other
36 stars 11 forks source link

repeated sentences in `ja_gsd-ud-test.conllu` #4

Closed shijieyao closed 6 years ago

shijieyao commented 6 years ago

Hi, not sure whether repeated sentences in the test set is trivial or not, but for future reference, I'd like to point out them as I've encountered while using the test set.

test-s205 & test-s208 test-s206 & test-s209 test-s207 & test-s210 test-s224 & test-s225 test-s247 & test-s248 test-s281 & test-s283

kanayamah commented 6 years ago

@shijieyeo, thank you very much. We removed many duplicated sentences before the release of v2.0 but some of them still remain as you pointed out. We will fix this in the next release.

kanayamah commented 6 years ago

other than 6 duplicated pairs in test, I found

They will be removed from the train portion.

kanayamah commented 6 years ago

fixed in the v2.3 candidate.