When I run python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds, I am able to return the same counts for gold and system entities as the report, but not the same precision, accuracy, and recall. The scores for these are either 0 or near-zero numbers.
Reproduction steps
cd to nel_eval
remove guid cpb-aacip-507-nk3610wp6s from both the preds and golds because of its defunct gold data. An error will occur otherwise.
run python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds
I have tried different methods of comparing the gold and preds (hashing, strings, manually checking), and at least to me it appears that the criteria for calling the preds and golds NamedEntityLink classes must have changed in the current iteration of evaluate.py from when this report was written.
Bug Description
When I run
python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds
, I am able to return the same counts for gold and system entities as the report, but not the same precision, accuracy, and recall. The scores for these are either 0 or near-zero numbers.Reproduction steps
cpb-aacip-507-nk3610wp6s
from both the preds and golds because of its defunct gold data. An error will occur otherwise.python evaluate.py preds@dbpedia-spotlight-wrapper@aapb-collaboration-21 golds
Expected behavior
See the report for the expected behavior.
Log output
No response
Screenshots
No response
Additional context
I have tried different methods of comparing the gold and preds (hashing, strings, manually checking), and at least to me it appears that the criteria for calling the preds and golds
NamedEntityLink
classes must have changed in the current iteration ofevaluate.py
from when this report was written.