Closed ahmadizzan closed 4 years ago
Hi,
Thank you for your interest in our work !
So this is true, we compute the Precision / Recall / F1 scores through the good/bad/missed/total variables, which only take into account words that have a reference sense key.
The reason why we do this, is basically because this is how WSD systems are evaluated in evaluation campaigns (SemEval 2007 task 7, SemEval 2013 task 12, etc.).
For instance, if you download the evaluation data of SemEval 2015 task 13 (here), and open the file named /scorer/Scorer.java
, which is the code of the official scorer used in the campaign, you'll see that they iterate over the system's hypotheses, and skip the hypothesis if it is not in the reference file, otherwise, they count as "good" if the hypothesis is contained in the reference, and "bad" otherwise.
I hope that it helps you :)
Thanks for the reply and reference to the WSD evaluation method!
Closing this issue.
Hi,
Thank you so much for making the code available for people to see and experiment with!
I'm currently trying to understand the mechanism for the evaluation part of the code. Specifically, I'm looking at the
WSDEvaluator
class (located in disambiguate/java/src/main/java/getalp/wsd/evaluation/), focusing onpublic DisambiguationResult computeDisambiguationResult(List<Word> wordList, String referenceSenseTag, String candidateSenseTag, String confidenceValueTag, double confidenceThreshold, WordnetHelper wn)
method.I see that the the
bad
property inDisambiguationResult
will only be incremented if there exists at least one referenceSenseKey and candidateSenseKey.Why is it that way and not to increment it when there's candidateSenseKey without caring about whether there's referenceSenseKey or not?
My understanding is that the
bad
property ofDisambiguationResult
corresponds to the number of incorrectly annotated sense from the WSD system, regardless of the ground truth data.Thank you.