Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
GNU General Public License v3.0
1.05k stars 209 forks source link

hidden error&Average sentence confidence&confidence voting #294

Closed Tailor2019 closed 1 month ago

Tailor2019 commented 2 years ago

Hello! @andbue @ChWick

After using the finetuning of Calamari with a pretrained model in the result there is the -hidden error -Average sentence confidence -confidence voting Please I don't understand their significance and what expression used to calculate them. Please can you show me their significance and the expression to calculate them.

Thanks a lot for your continued aid

andbue commented 2 years ago

-hidden error

https://github.com/Calamari-OCR/calamari/blob/15afa2988f9709a7f99b2ab5b3f7259b48cee2cf/calamari_ocr/scripts/eval.py#L24-L34

("Hidden" are the ones that are not listed in the table)

-Average sentence confidence

https://github.com/Calamari-OCR/calamari/blob/f1cdbb419204dd8cab79fddd1d5a7ea1090804bc/calamari_ocr/scripts/predict.py#L140 (An average over all the confidences for all the lines, where the confidence of a line is the average of the confidence for each char in the line)

-confidence voting

https://arxiv.org/abs/1711.09670

Tailor2019 commented 2 years ago

Thanks! @andbue Is the hidden error=percentage of the number of characters that Calamari commit an error ?

andbue commented 2 years ago

No. The percentage there is just to give you an idea about the amount of lines that are not listed in the table. If it's low, then you've got most of your errors already in the table (i.e. some kinds of errors are frequent). If it's high, the errors are similar in frequency.