eval.py reports higher than 100 aligned accuracy on enhanced dependencies

AngledLuffa commented 2 years ago

ELAS and EULAS scores are higher than 100:

python3 eval.py UD_English-EWT/en_ewt-ud-train.conllu UD_English-EWT/en_ewt-ud-train.conllu -v
Metric     | Precision |    Recall |  F1 Score | AligndAcc
-----------+-----------+-----------+-----------+-----------
Tokens     |    100.00 |    100.00 |    100.00 |
Sentences  |    100.00 |    100.00 |    100.00 |
Words      |    100.00 |    100.00 |    100.00 |
UPOS       |    100.00 |    100.00 |    100.00 |    100.00
XPOS       |    100.00 |    100.00 |    100.00 |    100.00
UFeats     |    100.00 |    100.00 |    100.00 |    100.00
AllTags    |    100.00 |    100.00 |    100.00 |    100.00
Lemmas     |    100.00 |    100.00 |    100.00 |    100.00
UAS        |    100.00 |    100.00 |    100.00 |    100.00
LAS        |    100.00 |    100.00 |    100.00 |    100.00
ELAS       |    100.00 |    100.00 |    100.00 |    105.02    <---
EULAS      |    100.00 |    100.00 |    100.00 |    105.02   <---
CLAS       |    100.00 |    100.00 |    100.00 |    100.00
MLAS       |    100.00 |    100.00 |    100.00 |    100.00
BLEX       |    100.00 |    100.00 |    100.00 |    100.00

If I had to guess without actually looking at the code, maybe it's getting extra credit for lines where there is more than one enhanced dependency to count?

Also, this happens if I do git checkout 799292f54c699fd2ccf90b0b890a0533ccf35fd4 in order to go earlier than my recent changes, so definitely not my fault :P

AngledLuffa commented 2 years ago

My intuition is 100% correct:

count of aligned lines, ignoring multiplicity: https://github.com/UniversalDependencies/tools/blob/77500d70683162f01cbcda4179ae488865d2ffc3/eval.py#L506

possibility of multiple +1 for a single line: https://github.com/UniversalDependencies/tools/blob/77500d70683162f01cbcda4179ae488865d2ffc3/eval.py#L513

I'd fix it, but I don't know what we should make "aligned accuracy" represent in this case, if anything. Perhaps an empty column is the most appropriate?

dan-zeman commented 2 years ago

Thanks for reporting. I agree that aligned accuracy does not make sense here. Fixed.

UniversalDependencies / tools

eval.py reports higher than 100 aligned accuracy on enhanced dependencies #90