asappresearch / spoken-ner

9 stars 2 forks source link

How are duplicate entities handled in the f1-scores? #1

Open nishthajain1611 opened 1 year ago

nishthajain1611 commented 1 year ago

@sshon-asapp, @felixgwu, @apasad-asapp The paper End-to-end named entity recognition from English speech by Yadav et al. specifies that they do not consider duplicate (tag, phrase) pairs while considering their precision-recall scores.

Your paper On the Use of External Data for Spoken Named Entity Recognition says that it uses the f1-measures from Yadav et al., but the evaluation in slue-toolkit code that you use to evaluate the scores for the results does not remove duplicates and effectively compares (tag, phrase, identifier) triplets for the f1-score.

Could you please clarify which metric you used for the results published in your paper.

ankitapasad commented 1 year ago

Hi @nishthajain1611

Thank you for your interest in our work.

Your understanding is correct. We use the micro-averaged F1 score, similar to Yadav et al. So the metric is the same. The difference is the way in which ground truth and predictions are represented, where we don't post-process to remove duplicates, thus retaining the real setting of the task. Both papers use different datasets so the scores are not directly comparable as is.