Training Model Struggling with Character Recognition in Custom Sports Fonts

Suriyapongmax commented 3 months ago

I am working on an OCR project aimed at accurately reading player numbers and names from sports images. These images feature 10 different custom fonts, predominantly thick and bold, which cater to a sports aesthetic. The primary challenge is the model's ability to distinguish between similar characters, particularly under the constraints of these stylized fonts.

Fonts: 10 custom sports fonts (English A-Z, a-z, 0-9). Training Data: Generated dataset of ~200K images, including mixed cases of 3-10 characters with stroke and non-stroke and numbers (00-99) for each font.

After I train with num_iter: 750000 , loss: 0.00126, Valid loss: 0.15052

Problems Encountered:

I test with Validate the accuracy is around 94% but if I test with train set the accuracy is still 85-90% which I think it should more than that because I train with 750K iterations and it should higher than Validate.
Confusion Between Similar Characters: The model often gets mixed up with characters that look alike. For instance, it reads 'NATALIE' as 'NATALlE,' mixing up 'I' and 'l'. It also confuses 'Q' with 'O' and 'Z' with 'z'. I think this might be happening because the training set mixes up upper and lower case randomly. If I switch to using words in all upper case like 'AAAA' or all lower case like 'aaaa' or 'Aaaa" , do you think that could make it better?
Random incorrect predictions with no apparent pattern (e.g., 'EGH' to 'DF').

Request for Help: I am seeking advice on improving my model’s performance in differentiating similar-looking characters,. Any suggestions on training strategies, network adjustments, or data preprocessing techniques would be greatly appreciated.

here are acutal image I want to predict

here are generated image I used as training set. nbv5231665978 oc210030322606 oc220080106966 oc220080146883 oc556143399804

here is the config I use batch_size: 32

FT: False optim: False lr: 1 beta1: 0.9 total_data_usage_ratio: 1.0 batch_max_length: 34 imgH: 64 imgW: 600 rgb: False contrast_adjust: 0.0 sensitive: True PAD: True data_filtering_off: False Transformation: None FeatureExtraction: VGG SequenceModeling: BiLSTM Prediction: CTC num_fiducial: 20 input_channel: 1 output_channel: 256 hidden_size: 256 decode: greedy

THANK YOU IN ADVANCE !!

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

Suriyapongmax commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

Suriyapongmax commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

I am using the following config:

manualSeed: 1111
workers: 4
batch_size: 128 #32
num_iter: 3000
valInterval: 200
FT: False
optim: False # default is Adadelta
lr: 1.
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'e' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 35 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False

JaidedAI / EasyOCR

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253