Closed toastynews closed 1 year ago
Thanks for the report! This looks like a bug in the scorer starting in v3.0 and an incorrect description in the docs, which should have "predicted tokens" instead of "gold tokens".
If you count all correct tokens as true positives and all incorrect predicted tokens as false positives, the intended token_acc
score is the precision (and this was correct in v2), but v3 is reporting the f-score instead of the precision.
In general, I'd recommend using token_p/r/f
instead.
Cool. Like you suggested, I am using token_p/r/f
which gives me deeper insights than the accuracy number. There is no urgency to merge if config versioning is a problem.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
The doc describes
token_acc
as:but it's actually doing
I don't fully understand what this formula is doing.
How to reproduce the behaviour
The result is
{'token_acc': 0.5, 'token_p': 0.3333333333333333, 'token_r': 0.2, 'token_f': 0.25}
The _p, _r and _f are all correct. It is a bit strange to have an accuracy of 0.5 when only 1 predicted token 'a' is correct out of 5 gold tokens.Your Environment