eval.py doesn't work on Arabic dev set

AngledLuffa commented 1 month ago

I got the following error when using the version of eval.py that ships with UD 2.14 when trying to read ar_padt.dev.gold.conllu by calling load_conllu_file directly:

Traceback (most recent call last):
  File "/home/john/stanza/stanza/models/common/utils.py", line 144, in ud_scores
    gold_ud = ud_eval.load_conllu_file(gold_conllu_file)
  File "/home/john/stanza/stanza/utils/conll18_ud_eval.py", line 688, in load_conllu_file
    return load_conllu(_file, path, treebank_type)
  File "/home/john/stanza/stanza/utils/conll18_ud_eval.py", line 419, in load_conllu
    raise UDError("Incorrect word ID '{}' for word '{}', expected '{}' at line {}".format(
stanza.utils.conll18_ud_eval.UDError: Incorrect word ID '49' for word 'الإسرائيلية', expected '50' at line 2502

(The path is weird, but that's because we copy it into our codebase and then import it. Incidentally, releasing it as a Python package would be very helpful.)

The issue is that there's a sentence with an empty node in the middle of an MWT:

# newpar id = afp.20000715.0008:p3
# sent_id = afp.20000715.0008:p3u1
47      ،       ،       PUNCT   G---------      _       46      punct   46:punct        Vform=،|Translit=,
48-49   والاسرائيلية     _       _       _       _       _       _       _       _
48      و       وَ       CCONJ   C---------      _       34      cc      48.1:cc Gloss=and|LTranslit=wa|Root=wa|Translit=wa|Vform=وَ
48.1    _       _       _       _       _       _       _       1:parataxis|2:conj      _
49      الإسرائيلية      إِسرَائِيلِيَّة       NOUN    N------S1D      Case=Nom|Definite=Def|Number=Sing       34      orphan  48.1:dep        Gloss=Israeli|LTranslitʾisrāʾīlīyat|Root='isrA'Il|Translit=al-ʾisrāʾīlīyatu|Vform=اَلإِسرَائِيلِيَّةُ
50      آنا     آنَا     X       X---------      Foreign=Yes     51      nmod    51:nmod Vform=آنَا|Gloss=Anna|Root='AnA|Translit=ʾānā|LTranslit=ʾānā

AngledLuffa commented 1 month ago

Can I bump this? It's in the way of using that script to measure the quality of new models built from the latest UD release.

dan-zeman commented 2 weeks ago

Hmm. This is a copy of the evaluation script that was used in the UD parsing shared tasks. I have not investigated the details but I suspect it was never applied to data containing empty nodes. Even in the Enhanced UD parsing shared tasks, paths with empty nodes were first collapsed and empty nodes removed, then this script was applied.

This is not to say the script should not be updated to digest any valid UD data. It definitely should.

AngledLuffa commented 1 week ago

I've found it works fine when used on datasets with empty nodes. This one is unique in that the empty node occurs in the middle of an MWT.

Do you need me to update it? I'd honestly prefer it if someone else took it on, but either way

UniversalDependencies / tools

eval.py doesn't work on Arabic dev set #102