clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.77k stars 1.11k forks source link

Fine-Tuning with Additional Characters -- Possible? #297

Closed JyK-Furiosa closed 2 years ago

JyK-Furiosa commented 3 years ago

Hello,

I am trying to add additional characters (--sensitive option) by fine-tuning existing non-sensitive model

I am using None-VGG-BiLSTM-CTC (a.k.a CRNN) model, which had accuracy of 62.795% on IC15_2077 test set

I have removed final layer of the pre-trained state_dict() in order to solve size mismatch problem. Also, I have frozen all layers except final fully connected layer in order to only train classifier layer. Otherwise, all options are equal to the default settings.

I used ST_spe (one with special characters) dataset to fine_tune, validation loss (on IC15_2077), train loss and accuracy values do not improve. (I have tested to train until 100000 steps).

What might be the main problem here? Dataset Size? (1.5M though...) Freezing the layers?

Following is my opt.txt

exp_name: None-VGG-BiLSTM-CTC-Seed1111
train_data: ../RecData/training
valid_data: ../RecData/evaluation/IC15_2077
manualSeed: 1111
workers: 4
batch_size: 192
num_iter: 300000
valInterval: 2000
saved_model: saved_models/CRNN_new/best_accuracy.pth
FT: True
adam: False
lr: 1
beta1: 0.9
rho: 0.95
eps: 1e-08
grad_clip: 5
baiduCTC: False
select_data: ['ST_spe']
batch_ratio: ['1']
total_data_usage_ratio: 1.0
batch_max_length: 25
imgH: 32
imgW: 100
rgb: False
character: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
sensitive: True
PAD: False
data_filtering_off: False
Transformation: None
FeatureExtraction: VGG
SequenceModeling: BiLSTM
Prediction: CTC
num_fiducial: 20
input_channel: 1
output_channel: 512
hidden_size: 256
num_gpu: 1
num_class: 95

Following is one validation log from opt_train.txt

[140000/300000] Train loss: 0.70766, Valid loss: 2.29187, Elapsed_time: 3637.79319
Current_accuracy : 11.587, Current_norm_ED  : 0.46
Best_accuracy    : 12.561, Best_norm_ED     : 0.48
--------------------------------------------------------------------------------
Ground Truth              | Prediction                | Confidence Score & T/F
--------------------------------------------------------------------------------
9am-                      | Sumem                     | 0.0011  False
SKIN!                     | SKin!                     | 0.2561  False
OUTLET                    | outlLET                   | 0.0396  False
CARE                      | Sintiue:                  | 0.0000  False
SALE                      | SAle                      | 0.0096  False
--------------------------------------------------------------------------------
nikkhilAvira commented 2 years ago

Hi I'm trying to retrain for special characters as well. Did you end up finding a solution?

rafaelagrc commented 2 years ago

Hello @JyK-Furiosa, were you able to find a solution?