Closed PonteIneptique closed 4 years ago
Sorry, I am not getting the issue, can you show the example?
Sure, and btw, I can't assign myself but that's something I'll be happy to do
Here is some output of a confusion matrix:
Expected | Total Errors | Predictions | Predicted times |
---|---|---|---|
qui | 243 | quod | 88 |
quis | 82 | ||
quam | 31 | ||
quo | 27 | ||
qua | 14 | ||
antequam | 1 |
What I'd Like is something more like
Expected | Total Errors | Support | Predictions | Predicted times | Support |
---|---|---|---|---|---|
qui | 243 | 500 | quod | 88 | 300 |
quis | 82 | 450 | |||
quam | 31 | 50 | |||
quo | 27 | 90 | |||
qua | 14 | 200 | |||
antequam | 1 | 500 |
That would help me see that qui is actually ~ 50% accurate and the biggest issue might be for quam
here
Sounds fine by me. Anyway the confusion tables for lemmatization are usually quite unwieldy, but it makes sense to have the marginal counts already there.
To be fair, for me, the example is more important for the other tasks (eg. Gender or Tense) :)
In the current state of the table, it's really hard to make sense from it as the support is quite unknown. For example, if I have NOMpro 120 times mislabeled 90 times as NOMcom, it's not the same situation as it being mislabeled 30 times.