PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.67k stars 7.86k forks source link

IndexError: list index out of range #5792

Closed monkeycc closed 2 years ago

monkeycc commented 2 years ago
python tools/train.py -c configs/rec/rec_icdar15_train.yml

[2022/03/26 17:09:54] root INFO: Architecture :
[2022/03/26 17:09:54] root INFO:     Backbone :
[2022/03/26 17:09:54] root INFO:         model_name : large
[2022/03/26 17:09:54] root INFO:         name : MobileNetV3
[2022/03/26 17:09:54] root INFO:         scale : 0.5
[2022/03/26 17:09:54] root INFO:     Head :
[2022/03/26 17:09:54] root INFO:         fc_decay : 0
[2022/03/26 17:09:54] root INFO:         name : CTCHead
[2022/03/26 17:09:54] root INFO:     Neck :
[2022/03/26 17:09:54] root INFO:         encoder_type : rnn
[2022/03/26 17:09:54] root INFO:         hidden_size : 96
[2022/03/26 17:09:54] root INFO:         name : SequenceEncoder
[2022/03/26 17:09:54] root INFO:     Transform : None
[2022/03/26 17:09:54] root INFO:     algorithm : CRNN
[2022/03/26 17:09:54] root INFO:     model_type : rec
[2022/03/26 17:09:54] root INFO: Eval :
[2022/03/26 17:09:54] root INFO:     dataset :
[2022/03/26 17:09:54] root INFO:         data_dir : ./train_data/rec
[2022/03/26 17:09:54] root INFO:         label_file_list : ['./train_data/rec/rec_gt_test.txt']
[2022/03/26 17:09:54] root INFO:         name : SimpleDataSet
[2022/03/26 17:09:54] root INFO:         transforms :
[2022/03/26 17:09:54] root INFO:             DecodeImage :
[2022/03/26 17:09:54] root INFO:                 channel_first : False
[2022/03/26 17:09:54] root INFO:                 img_mode : BGR
[2022/03/26 17:09:54] root INFO:             CTCLabelEncode : None
[2022/03/26 17:09:54] root INFO:             RecResizeImg :
[2022/03/26 17:09:54] root INFO:                 image_shape : [3, 32, 100]
[2022/03/26 17:09:54] root INFO:             KeepKeys :
[2022/03/26 17:09:54] root INFO:                 keep_keys : ['image', 'label', 'length']
[2022/03/26 17:09:54] root INFO:     loader :
[2022/03/26 17:09:54] root INFO:         batch_size_per_card : 256
[2022/03/26 17:09:54] root INFO:         drop_last : False
[2022/03/26 17:09:54] root INFO:         num_workers : 4
[2022/03/26 17:09:54] root INFO:         shuffle : False
[2022/03/26 17:09:54] root INFO:         use_shared_memory : False
[2022/03/26 17:09:54] root INFO: Global :
[2022/03/26 17:09:54] root INFO:     cal_metric_during_train : True
[2022/03/26 17:09:54] root INFO:     character_dict_path : ppocr/utils/en_dict.txt
[2022/03/26 17:09:54] root INFO:     checkpoints : None
[2022/03/26 17:09:54] root INFO:     debug : False
[2022/03/26 17:09:54] root INFO:     distributed : False
[2022/03/26 17:09:54] root INFO:     epoch_num : 72
[2022/03/26 17:09:54] root INFO:     eval_batch_step : [0, 2000]
[2022/03/26 17:09:54] root INFO:     infer_img : doc/imgs_words_en/word_10.png
[2022/03/26 17:09:54] root INFO:     infer_mode : False
[2022/03/26 17:09:54] root INFO:     log_smooth_window : 20
[2022/03/26 17:09:54] root INFO:     max_text_length : 25
[2022/03/26 17:09:54] root INFO:     pretrained_model : None
[2022/03/26 17:09:54] root INFO:     print_batch_step : 10
[2022/03/26 17:09:54] root INFO:     save_epoch_step : 3
[2022/03/26 17:09:54] root INFO:     save_inference_dir : ./
[2022/03/26 17:09:54] root INFO:     save_model_dir : ./output/rec/ic15/
[2022/03/26 17:09:54] root INFO:     save_res_path : ./output/rec/predicts_ic15.txt
[2022/03/26 17:09:54] root INFO:     use_gpu : True
[2022/03/26 17:09:54] root INFO:     use_space_char : False
[2022/03/26 17:09:54] root INFO:     use_visualdl : False
[2022/03/26 17:09:54] root INFO: Loss :
[2022/03/26 17:09:54] root INFO:     name : CTCLoss
[2022/03/26 17:09:54] root INFO: Metric :
[2022/03/26 17:09:54] root INFO:     main_indicator : acc
[2022/03/26 17:09:54] root INFO:     name : RecMetric
[2022/03/26 17:09:54] root INFO: Optimizer :
[2022/03/26 17:09:54] root INFO:     beta1 : 0.9
[2022/03/26 17:09:54] root INFO:     beta2 : 0.999
[2022/03/26 17:09:54] root INFO:     lr :
[2022/03/26 17:09:54] root INFO:         learning_rate : 0.0005
[2022/03/26 17:09:54] root INFO:     name : Adam
[2022/03/26 17:09:54] root INFO:     regularizer :
[2022/03/26 17:09:54] root INFO:         factor : 0
[2022/03/26 17:09:54] root INFO:         name : L2
[2022/03/26 17:09:54] root INFO: PostProcess :
[2022/03/26 17:09:54] root INFO:     name : CTCLabelDecode
[2022/03/26 17:09:54] root INFO: Train :
[2022/03/26 17:09:54] root INFO:     dataset :
[2022/03/26 17:09:54] root INFO:         data_dir : ./train_data/rec/
[2022/03/26 17:09:54] root INFO:         label_file_list : ['./train_data/rec/rec_gt_train.txt']
[2022/03/26 17:09:54] root INFO:         name : SimpleDataSet
[2022/03/26 17:09:54] root INFO:         transforms :
[2022/03/26 17:09:54] root INFO:             DecodeImage :
[2022/03/26 17:09:54] root INFO:                 channel_first : False
[2022/03/26 17:09:54] root INFO:                 img_mode : BGR
[2022/03/26 17:09:54] root INFO:             CTCLabelEncode : None
[2022/03/26 17:09:54] root INFO:             RecResizeImg :
[2022/03/26 17:09:54] root INFO:                 image_shape : [3, 32, 100]
[2022/03/26 17:09:54] root INFO:             KeepKeys :
[2022/03/26 17:09:54] root INFO:                 keep_keys : ['image', 'label', 'length']
[2022/03/26 17:09:54] root INFO:     loader :
[2022/03/26 17:09:54] root INFO:         batch_size_per_card : 256
[2022/03/26 17:09:54] root INFO:         drop_last : True
[2022/03/26 17:09:54] root INFO:         num_workers : 8
[2022/03/26 17:09:54] root INFO:         shuffle : True
[2022/03/26 17:09:54] root INFO:         use_shared_memory : False
[2022/03/26 17:09:54] root INFO: profiler_options : None
[2022/03/26 17:09:54] root INFO: train with paddle 2.2.2 and device CUDAPlace(0)
[2022/03/26 17:09:54] root INFO: Initialize indexs of datasets:['./train_data/rec/rec_gt_train.txt']
[2022/03/26 17:09:54] root INFO: Initialize indexs of datasets:['./train_data/rec/rec_gt_test.txt']
W0326 17:09:54.520195 16696 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.3, Runtime API Version: 11.2
W0326 17:09:54.897189 16696 device_context.cc:465] device: 0, cuDNN Version: 8.2.
[2022/03/26 17:10:00] root INFO: train from scratch
[2022/03/26 17:10:00] root INFO: train dataloader has 3 iters
[2022/03/26 17:10:00] root INFO: valid dataloader has 4 iters
[2022/03/26 17:10:00] root INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations
[2022/03/26 17:10:00] root INFO: Initialize indexs of datasets:['./train_data/rec/rec_gt_train.txt']
[2022/03/26 17:10:00] root ERROR: When parsing line train_data/rec/train/830.jpg RG91718970-916-8005
, error happened with msg: Traceback (most recent call last):
  File "E:\PaddleOCR\ppocr\data\simple_dataset.py", line 110, in __getitem__
    label = substr[1]
IndexError: list index out of range

[2022/03/26 17:10:00] root ERROR: When parsing line train_data/rec/train/128.jpg AA49473340-963-80411
, error happened with msg: Traceback (most recent call last):
  File "E:\PaddleOCR\ppocr\data\simple_dataset.py", line 110, in __getitem__
    label = substr[1]
IndexError: list index out of range
./train_data/rec/rec_gt_train.txt

train_data/rec/train/0.jpg KQ36482939-2711-83921
train_data/rec/train/1.jpg ID34987499-5411-90731
tink2123 commented 2 years ago

训练标签的img_path 和 gt 之间不是用 \t 分隔的吧,看起来是一个空格

monkeycc commented 2 years ago

解决 确实要\t 分隔