Model Consistently Mispredicting Specific Character in Invoice Number

Hello,

I’ve encountered an issue with the model's predictions on invoice numbers. My dataset consists of 800 training images, 100 testing images, and 100 validation images. The model has been trained successfully, yielding the following accuracy scores:

Total number of samples: 100 Tree Edit Distance (TED) based accuracy score: 0.9476799242424243 F1 accuracy score: 0.5213032581453634 Despite the promising TED accuracy of 94%, a detailed examination of the predictions revealed a persistent error. The model aims to parse documents containing a 15-digit alphanumeric invoice number. However, I observed that the model consistently mispredicts the third character from the end of the invoice number, interpreting a '2' as a '1'. This error was present in 94 out of the 100 tested images.

This misprediction is critical because an incorrect character in the invoice number renders the entire prediction inaccurate, thereby questioning the effectiveness of employing the AI model for this task.

I am seeking guidance or recommendations to improve the model's precision in predicting this specific character within the invoice number. Any assistance or suggestions would be immensely appreciated.

Thank you.

clovaai / donut

Model Consistently Mispredicting Specific Character in Invoice Number #260