Time it takes to train ResNet + CTC and VGG + CTC

alwc commented 5 years ago

Hi folks,

You guys did a great job of comparing and contrasting different module effects on text recognition! I'm currently trying to train a fast model, so I'm trying to train ResNet + CTC and VGG + CTC on my own. Using the default setting from your training script, I just wonder how long does the training take to reach ~70% Accuracy as shown in Table 8?

By the way, have you tried out using MobileNet-V2 as the backbone? Using the provided settings it seems I can't get MobileNet-V2 + CTC pass ~40% accuracy.

ku21fan commented 5 years ago

Hello,

With TPS-ResNet-BiLSTM-CTC, the valid accuracy reached about 70% at 10,000 steps, elapsed time 4,887 seconds with GPU P40. I guess that valid accuracy 70% would close to Total accuracy 70% in Table 8.

I did not try MobileNet-V2.. How about trying to use Adam optimizer with different learning rate?

Best

alwc commented 5 years ago

Hi @ku21fan, I'm talking about default settings with "None-ResNet-None-CTC", not "TPS-ResNet-BiLSTM-CTC". Currently it took 72000 seconds with one Titan X GPU to reach the following statistics:

[104000/300000] valid loss: 2.66333 accuracy: 52.231, norm_ED: 2238.44

In the paper (Table 8) it mentions "None-ResNet-None-CTC" could reach ~80%, so I'm wondering how much longer I need to train the model to reach ~80%.

ku21fan commented 5 years ago

I am sorry, I misunderstood. That's low accuracy, compared to our experiment.

This is our log of None-ResNet-None-CTC.txt

[1/100][104000/300000] Test loss: 2.244991, accuracy: 75.286, norm_ED: 764.73

(output format is the old version, so its slightly different form from now)

I will re-train this model and check it.

I have some questions.

Did you use the smaller batch size? In our experiment, the model was trained with batch size 192.
Can you upload your opt.txt file? I would like to check opt.txt to conjecture the reason for low accuracy.

Best.

ku21fan commented 5 years ago

@alwc I re-trained this model and it has a similar result with our previous model

[104000/300000] valid loss: 0.44846 accuracy: 74.914, norm_ED: 759.90

This is our log of None_ResNet_None_CTC.txt

I have a question. Did you use character sensitive option with --sensitive? If you did so, the accuracy will be low (and I guess it would be close to your result). Because some datasets in the validation dataset (training dataset of IIIT, SVT) have case-insensitive labels. In general, STR researches usually use case-insensitive mode, and our paper uses case-insensitive mode (which reaches ~80% accuracy) as well.

Best.

alwc commented 5 years ago

@ku21fan

I trained the model with batch size 192.
Here is my opt.txt:

------------ Options -------------
experiment_name: ResNet-Seed1111
train_data: /data_SSD/datasets/deep-text-recognition-benchmark/data_lmdb_release/training
valid_data: /data_SSD/datasets/deep-text-recognition-benchmark/data_lmdb_release/validation
saved_models_path: /data_SSD/pretrained_models/clovaai.deep-text-recognition-benchmark
manualSeed: 1111
workers: 4
batch_size: 192
num_iter: 300000
valInterval: 2000
continue_model: 
adam: False
lr: 1
beta1: 0.9
rho: 0.95
eps: 1e-08
grad_clip: 5
select_data: ['MJ', 'ST']
batch_ratio: ['0.5', '0.5']
total_data_usage_ratio: 1.0
batch_max_length: 25
imgH: 32
imgW: 100
rgb: True
character: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
sensitive: True
PAD: False
Transformation: None
FeatureExtraction: ResNet
SequenceModeling: None
Prediction: CTC
num_fiducial: 20
input_channel: 3
output_channel: 512
hidden_size: 256
num_gpu: 1
num_class: 95
---------------------------------------

Like what you've said, I think the key difference is that your model was trained with "insensitive" data while my model was trained with "sensitive" data. Let me create a validation set with case-sensitive labels and see how the model performs. Thanks!

clovaai / deep-text-recognition-benchmark

Time it takes to train ResNet + CTC and VGG + CTC #29