clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.72k stars 1.09k forks source link

Training stuck after first iteration #395

Open Xzoky174 opened 1 year ago

Xzoky174 commented 1 year ago

After running train.py, the first iteration takes a few seconds to run, and then it just hangs. CPU usage also spikes. I kept it on for several hours, but nothing happened.

Output:

opt.select_data: ['/']
opt.batch_ratio: ['1']
--------------------------------------------------------------------------------
dataset_root:    result  dataset: /
sub-directory:  /.       num samples: 5
num total samples of /: 5 x 1.0 (total_data_usage_ratio) = 5
num samples of / per batch: 20 x 1.0 (batch_ratio) = 20
--------------------------------------------------------------------------------
Total_batch_size: 20 = 20
--------------------------------------------------------------------------------
dataset_root:    result  dataset: /
sub-directory:  /.       num samples: 5
--------------------------------------------------------------------------------
model input parameters 50 180 20 1 512 256 38 25 TPS ResNet BiLSTM Attn
Skip Transformation.LocalizationNetwork.localization_fc2.weight as it is already initialized
Skip Transformation.LocalizationNetwork.localization_fc2.bias as it is already initialized
Model:
DataParallel(
...

Trainable params num :  49555182
------------ Options -------------
exp_name: TPS-ResNet-BiLSTM-Attn-Seed1111
train_data: result
valid_data: result
manualSeed: 1111
workers: 4
batch_size: 20
num_iter: 300000
valInterval: 2000
saved_model:
FT: False
adam: False
lr: 1
beta1: 0.9
rho: 0.95
eps: 1e-08
grad_clip: 5
baiduCTC: False
select_data: ['/']
batch_ratio: ['1']
total_data_usage_ratio: 1.0
batch_max_length: 25
imgH: 50
imgW: 180
rgb: False
character: 0123456789abcdefghijklmnopqrstuvwxyz
sensitive: False
PAD: False
data_filtering_off: False
Transformation: TPS
FeatureExtraction: ResNet
SequenceModeling: BiLSTM
Prediction: Attn
num_fiducial: 20
output_channel: 512
hidden_size: 256
num_gpu: 0
num_class: 38
---------------------------------------

[1/300000] Train loss: 3.52311, Valid loss: 3.49874, Elapsed_time: 4.66224
Current_accuracy : 0.000, Current_norm_ED  : 0.00
Best_accuracy    : 0.000, Best_norm_ED     : 0.00
--------------------------------------------------------------------------------
Ground Truth              | Prediction                | Confidence Score & T/F
--------------------------------------------------------------------------------
2                         | aaaaaaaaaaaaaaaaaaaaaaaaa | 0.0000  False
1                         |                           | 0.0000  False
5                         | aaaaaaaaggggggggggggggggg | 0.0000  False
4                         |                           | 0.0000  False
3                         | aaaaaaaaaaaaaaaaaaaaaaaaa | 0.0000  False
--------------------------------------------------------------------------------

*Program gets stuck here (doesn't exit)*

I'm not using a Nvidia GPU.