UniversalDependencies / tools

Various utilities for processing the data.
GNU General Public License v2.0
203 stars 43 forks source link

eval.py doesn't work on current UD treebanks #88

Closed AngledLuffa closed 2 years ago

AngledLuffa commented 2 years ago

If I use eval.py to compare a gold treebank against itself, it does not work. For example, a current git repo of English EWT has the following error.

Actually, it's unclear to me if this is a problem with the eval script or the treebank... is it okay for a copy node to have no dependency information like this?

[john@localhost udtools]$ python3 eval.py ~/extern_data/ud2/git/UD_English-EWT/en_ewt-ud-train.conllu ~/extern_data/ud2/git/UD_English-EWT/en_ewt-ud-train.conllu
Traceback (most recent call last):
  File "eval.py", line 705, in <module>
    main()
  File "eval.py", line 670, in main
    evaluation = evaluate_wrapper(args)
  File "eval.py", line 650, in evaluate_wrapper
    gold_ud = load_conllu_file(args.gold_file,treebank_type)
  File "eval.py", line 637, in load_conllu_file
    return load_conllu(_file,treebank_type)
  File "eval.py", line 357, in load_conllu
    raise UDError("The collapsed CoNLL-U line still contains empty nodes: {}".format(_encode(line)))
__main__.UDError: The collapsed CoNLL-U line still contains empty nodes: 8.1    reported        report  VERB    VBN     Tense=Past|VerbForm=Part|Voice=Pass     _       _       5:conj:and      CopyOf=5
dan-zeman commented 2 years ago

Hmm, good point. We need to do something about it. The evaluation of enhanced UD parsing assumed that the files are preprocessed and empty nodes dissolved, that's where this error message comes from. But it should be possible to run the script on files that have not been preprocessed (and are thus valid CoNLL-U :-)), especially if the user is only interested in basic UD.

AngledLuffa commented 2 years ago

Excellent, thank you!