Closed skyWalker1997 closed 2 years ago
I have the same concern, can you release the test.txt and checkpoint-3060 in your inference.py
Hi, we calculate the absolute accuracy during training, since we do not decode during training. For inference, we calculate the entity-level f1, as shown in utils_metrics.py.
The acc calculated in the project only computed the absolute accuracy of generated sequence, which contained the tokens like: "is," "a," "an," "entity."
I calculated the P, R, F1 of the entity token content and entity class of the generated sequence(train based on CoNLL03). The evaluation results are inconsistent with the paper, with the ' organization' entity obtaining only 0.58 F1. Could you please publish the dataset of the paper and the complete evaluation methods?