Calamari-OCR / calamari

Line based ATR Engine based on OCRopy
GNU General Public License v3.0
1.05k stars 209 forks source link

encoding of arabic characters in the confusion file is wrong #307

Open Tailor2019 opened 2 years ago

Tailor2019 commented 2 years ago

Hello! I'm using the version 2.1.1 of calamari. I trained it on my arabic database. for validation: !calamari-eval --gt.texts .gt.txt --pred File --pred.texts 'dirto/.pred.txt' --n_confusions=-1 --xlsx_output dirto/XLSX_OUTPUT as result in the confusion file: 00

the "GT" and "PRED" values in this screenshot from the confusion file does not match the true text of the correspondant image in fact this line of the confusion file correspond to this image: ![ 998.gt.txt 998 Please how can I obtain a correct confusion file where the GT and the PRED fields have the structure as the image? Thanks alot in advance!

andbue commented 2 years ago

Hi, thanks for your report! Could you be a bit more specific about what happens? Is the GT text content somehow put in a different line in the table, are GT and Pred swapped or is there just a problem with right to left ordering of characters?

Tailor2019 commented 2 years ago

Thanks for your reply! there is no relation of the GT in the confusion matrix and the real GT the same for the prediction file despite it have a very law error rate. Thanks helping me !