Open ijindal opened 2 years ago
The list of affected repositories/files:
UD_Ukrainian-IU/uk_iu-up-test.conllup UD_Ukrainian-IU/uk_iu-up-train.conllup UD_Ukrainian-IU/uk_iu-up-dev.conllup
UD_Dutch-LassySmall/nl_lassysmall-up-dev.conllup UD_Dutch-LassySmall/nl_lassysmall-up-test.conllup UD_Dutch-LassySmall/nl_lassysmall-up-train.conllup
UD_Russian-SynTagRus/ru_syntagrus-up-test.conllup UD_Russian-SynTagRus/ru_syntagrus-up-train.conllup UD_Russian-SynTagRus/ru_syntagrus-up-dev.conllup
UD_Czech-PDT/cs_pdt-up-train.conllup UD_Czech-PDT/cs_pdt-up-dev.conllup UD_Czech-PDT/cs_pdt-up-test.conllup
UD_Spanish-AnCora/es_ancora-up-dev.conllup UD_Spanish-AnCora/es_ancora-up-train.conllup UD_Spanish-AnCora/es_ancora-up-test.conllup
UD_Finnish-TDT/fi_tdt-up-train.conllup UD_Finnish-TDT/fi_tdt-up-dev.conllup UD_Finnish-TDT/fi_tdt-up-test.conllup
UD_Czech-FicTree/cs_fictree-up-train.conllup UD_Czech-FicTree/cs_fictree-up-dev.conllup UD_Czech-FicTree/cs_fictree-up-test.conllup
UD_Czech-CAC/cs_cac-up-train.conllup UD_Czech-CAC/cs_cac-up-test.conllup
UD_Dutch-Alpino/nl_alpino-up-train.conllup UD_Dutch-Alpino/nl_alpino-up-dev.conllup UD_Dutch-Alpino/nl_alpino-up-test.conllup
Issue: For some the languages\corpuses, a issues with the token counting is observed. This issue is related to empty nodes in UD enhanced dependency. Arguments in UP data counts empty nodes as one of the token id.
Impact: So far, we know some of the repositories are affected.
Fix: