Closed alwc closed 5 years ago
Hello,
With TPS-ResNet-BiLSTM-CTC, the valid accuracy reached about 70% at 10,000 steps, elapsed time 4,887 seconds with GPU P40. I guess that valid accuracy 70% would close to Total accuracy 70% in Table 8.
I did not try MobileNet-V2.. How about trying to use Adam optimizer with different learning rate?
Best
Hi @ku21fan, I'm talking about default settings with "None-ResNet-None-CTC", not "TPS-ResNet-BiLSTM-CTC". Currently it took 72000 seconds with one Titan X GPU to reach the following statistics:
[104000/300000] valid loss: 2.66333 accuracy: 52.231, norm_ED: 2238.44
In the paper (Table 8) it mentions "None-ResNet-None-CTC" could reach ~80%, so I'm wondering how much longer I need to train the model to reach ~80%.
I am sorry, I misunderstood. That's low accuracy, compared to our experiment.
This is our log of None-ResNet-None-CTC.txt
[1/100][104000/300000] Test loss: 2.244991, accuracy: 75.286, norm_ED: 764.73
(output format is the old version, so its slightly different form from now)
I will re-train this model and check it.
I have some questions.
opt.txt
file?
I would like to check opt.txt
to conjecture the reason for low accuracy.Best.
@alwc I re-trained this model and it has a similar result with our previous model
[104000/300000] valid loss: 0.44846 accuracy: 74.914, norm_ED: 759.90
This is our log of None_ResNet_None_CTC.txt
I have a question.
Did you use character sensitive option with --sensitive
?
If you did so, the accuracy will be low (and I guess it would be close to your result).
Because some datasets in the validation dataset (training dataset of IIIT, SVT) have case-insensitive labels.
In general, STR researches usually use case-insensitive mode, and our paper uses case-insensitive mode (which reaches ~80% accuracy) as well.
Best.
@ku21fan
I trained the model with batch size 192.
Here is my opt.txt
:
------------ Options -------------
experiment_name: ResNet-Seed1111
train_data: /data_SSD/datasets/deep-text-recognition-benchmark/data_lmdb_release/training
valid_data: /data_SSD/datasets/deep-text-recognition-benchmark/data_lmdb_release/validation
saved_models_path: /data_SSD/pretrained_models/clovaai.deep-text-recognition-benchmark
manualSeed: 1111
workers: 4
batch_size: 192
num_iter: 300000
valInterval: 2000
continue_model:
adam: False
lr: 1
beta1: 0.9
rho: 0.95
eps: 1e-08
grad_clip: 5
select_data: ['MJ', 'ST']
batch_ratio: ['0.5', '0.5']
total_data_usage_ratio: 1.0
batch_max_length: 25
imgH: 32
imgW: 100
rgb: True
character: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
sensitive: True
PAD: False
Transformation: None
FeatureExtraction: ResNet
SequenceModeling: None
Prediction: CTC
num_fiducial: 20
input_channel: 3
output_channel: 512
hidden_size: 256
num_gpu: 1
num_class: 95
---------------------------------------
Like what you've said, I think the key difference is that your model was trained with "insensitive" data while my model was trained with "sensitive" data. Let me create a validation set with case-sensitive labels and see how the model performs. Thanks!
Hi folks,
You guys did a great job of comparing and contrasting different module effects on text recognition! I'm currently trying to train a fast model, so I'm trying to train ResNet + CTC and VGG + CTC on my own. Using the default setting from your training script, I just wonder how long does the training take to reach ~70% Accuracy as shown in Table 8?
By the way, have you tried out using MobileNet-V2 as the backbone? Using the provided settings it seems I can't get MobileNet-V2 + CTC pass ~40% accuracy.