Evaluation Metrics - Githubissues

Nealcly / templateNER

Source code for template-based NER

208 stars 39 forks source link

Evaluation Metrics #8

Closed skyWalker1997 closed 2 years ago

skyWalker1997 commented 2 years ago

The acc calculated in the project only computed the absolute accuracy of generated sequence, which contained the tokens like: "is," "a," "an," "entity."

I calculated the P, R, F1 of the entity token content and entity class of the generated sequence（train based on CoNLL03）. The evaluation results are inconsistent with the paper, with the ' organization' entity obtaining only 0.58 F1. Could you please publish the dataset of the paper and the complete evaluation methods?

Heihaierr commented 2 years ago

I have the same concern, can you release the test.txt and checkpoint-3060 in your inference.py

Nealcly commented 2 years ago

Hi, we calculate the absolute accuracy during training, since we do not decode during training. For inference, we calculate the entity-level f1, as shown in utils_metrics.py.