getalp / disambiguate

Disambiguate is a tool for training and using state of the art neural WSD models
https://arxiv.org/abs/1905.05677
MIT License
58 stars 17 forks source link

Evaluation method #8

Closed ahmadizzan closed 4 years ago

ahmadizzan commented 4 years ago

Hi,

Thank you so much for making the code available for people to see and experiment with!

I'm currently trying to understand the mechanism for the evaluation part of the code. Specifically, I'm looking at the WSDEvaluator class (located in disambiguate/java/src/main/java/getalp/wsd/evaluation/), focusing on public DisambiguationResult computeDisambiguationResult(List<Word> wordList, String referenceSenseTag, String candidateSenseTag, String confidenceValueTag, double confidenceThreshold, WordnetHelper wn) method.

I see that the the bad property in DisambiguationResult will only be incremented if there exists at least one referenceSenseKey and candidateSenseKey.

Why is it that way and not to increment it when there's candidateSenseKey without caring about whether there's referenceSenseKey or not?

My understanding is that the bad property of DisambiguationResult corresponds to the number of incorrectly annotated sense from the WSD system, regardless of the ground truth data.

Thank you.

loic-vial commented 4 years ago

Hi,

Thank you for your interest in our work !

So this is true, we compute the Precision / Recall / F1 scores through the good/bad/missed/total variables, which only take into account words that have a reference sense key.

The reason why we do this, is basically because this is how WSD systems are evaluated in evaluation campaigns (SemEval 2007 task 7, SemEval 2013 task 12, etc.).

For instance, if you download the evaluation data of SemEval 2015 task 13 (here), and open the file named /scorer/Scorer.java, which is the code of the official scorer used in the campaign, you'll see that they iterate over the system's hypotheses, and skip the hypothesis if it is not in the reference file, otherwise, they count as "good" if the hypothesis is contained in the reference, and "bad" otherwise.

I hope that it helps you :)

ahmadizzan commented 4 years ago

Thanks for the reply and reference to the WSD evaluation method!

Closing this issue.