UniversalPropositions / docs

To discuss overall UP related stuff.
1 stars 0 forks source link

BUG: Empty nodes are not handled properly #3

Open ijindal opened 2 years ago

ijindal commented 2 years ago

Issue: For some the languages\corpuses, a issues with the token counting is observed. This issue is related to empty nodes in UD enhanced dependency. Arguments in UP data counts empty nodes as one of the token id.

Impact: So far, we know some of the repositories are affected.

Fix:

michalu commented 2 years ago

The list of affected repositories/files:

UD_Ukrainian-IU/uk_iu-up-test.conllup UD_Ukrainian-IU/uk_iu-up-train.conllup UD_Ukrainian-IU/uk_iu-up-dev.conllup

UD_Dutch-LassySmall/nl_lassysmall-up-dev.conllup UD_Dutch-LassySmall/nl_lassysmall-up-test.conllup UD_Dutch-LassySmall/nl_lassysmall-up-train.conllup

UD_Russian-SynTagRus/ru_syntagrus-up-test.conllup UD_Russian-SynTagRus/ru_syntagrus-up-train.conllup UD_Russian-SynTagRus/ru_syntagrus-up-dev.conllup

UD_Czech-PDT/cs_pdt-up-train.conllup UD_Czech-PDT/cs_pdt-up-dev.conllup UD_Czech-PDT/cs_pdt-up-test.conllup

UD_Spanish-AnCora/es_ancora-up-dev.conllup UD_Spanish-AnCora/es_ancora-up-train.conllup UD_Spanish-AnCora/es_ancora-up-test.conllup

UD_Finnish-TDT/fi_tdt-up-train.conllup UD_Finnish-TDT/fi_tdt-up-dev.conllup UD_Finnish-TDT/fi_tdt-up-test.conllup

UD_Czech-FicTree/cs_fictree-up-train.conllup UD_Czech-FicTree/cs_fictree-up-dev.conllup UD_Czech-FicTree/cs_fictree-up-test.conllup

UD_Czech-CAC/cs_cac-up-train.conllup UD_Czech-CAC/cs_cac-up-test.conllup

UD_Dutch-Alpino/nl_alpino-up-train.conllup UD_Dutch-Alpino/nl_alpino-up-dev.conllup UD_Dutch-Alpino/nl_alpino-up-test.conllup