JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.61k stars 3.1k forks source link

Training Model Struggling with Character Recognition in Custom Sports Fonts #1253

Open Suriyapongmax opened 3 months ago

Suriyapongmax commented 3 months ago

I am working on an OCR project aimed at accurately reading player numbers and names from sports images. These images feature 10 different custom fonts, predominantly thick and bold, which cater to a sports aesthetic. The primary challenge is the model's ability to distinguish between similar characters, particularly under the constraints of these stylized fonts.

Fonts: 10 custom sports fonts (English A-Z, a-z, 0-9). Training Data: Generated dataset of ~200K images, including mixed cases of 3-10 characters with stroke and non-stroke and numbers (00-99) for each font.

After I train with num_iter: 750000 , loss: 0.00126, Valid loss: 0.15052

Problems Encountered:

Request for Help: I am seeking advice on improving my model’s performance in differentiating similar-looking characters,. Any suggestions on training strategies, network adjustments, or data preprocessing techniques would be greatly appreciated.

here are acutal image I want to predict

Adihaus Bold_nostroke (7) Adihaus Bold_nostroke123 (12) Adihaus Bold_stroke123 (10) Alexandria_nostroke123 (29)

here are generated image I used as training set. nbv5231665978 oc210030322606 oc220080106966 oc220080146883 oc556143399804

here is the config I use batch_size: 32

FT: False optim: False lr: 1 beta1: 0.9 total_data_usage_ratio: 1.0 batch_max_length: 34 imgH: 64 imgW: 600 rgb: False contrast_adjust: 0.0 sensitive: True PAD: True data_filtering_off: False Transformation: None FeatureExtraction: VGG SequenceModeling: BiLSTM Prediction: CTC num_fiducial: 20 input_channel: 1 output_channel: 256 hidden_size: 256 decode: greedy

THANK YOU IN ADVANCE !!

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

Suriyapongmax commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

Suriyapongmax commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

alikhalil98771 commented 3 months ago

Hi, I want to ask you how long it took you to fine-tune the model with this number of iterations and dataset size, as I only used 3000 iterations and 20K images, and it is taking a very long time.

200K images with 40,000 Iteratios about 1 hour.

Thanks for the reply. Also can you please tell me the gpu specification where I have 3050 12G and it still going more than 4 hours.

I use 4060Ti 16GB. How much learning rate & Batch size you choose ?

I am using the following config:

manualSeed: 1111
workers: 4
batch_size: 128 #32
num_iter: 3000
valInterval: 200
FT: False
optim: False # default is Adadelta
lr: 1.
beta1: 0.9
rho: 0.95
eps: 0.00000001
grad_clip: 5
#Data processing
select_data: 'e' # this is dataset folder in train_data
batch_ratio: '1' 
total_data_usage_ratio: 1.0
batch_max_length: 35 
imgH: 64
imgW: 600
rgb: False
contrast_adjust: False
sensitive: True
PAD: True
contrast_adjust: 0.0
data_filtering_off: False
# Model Architecture
Transformation: 'None'
FeatureExtraction: 'ResNet'
SequenceModeling: 'BiLSTM'
Prediction: 'CTC'
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 512
decode: 'greedy'
new_prediction: False
freeze_FeatureFxtraction: False
freeze_SequenceModeling: False