PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
40.31k stars 7.46k forks source link

indexError: list index out of range #8879

Closed ftmasadi closed 1 year ago

ftmasadi commented 1 year ago

Hello When I want to run the model on Persian FINETUNE using my personal dataset, it starts running, but it gives me this error in every run. Where is the problem with my work? Thank you for guiding me. pocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in __getitem__ label = substr[1] IndexError: list index out of range

ftmasadi commented 1 year ago

`Global: debug: false use_gpu: true epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: /content/drive/MyDrive/output/v3_arabic_mobile save_epoch_step: 5 eval_batch_step: [0, 100] cal_metric_during_train: true pretrained_model: /content/PaddleOCR/pretrained_model/arabic_PP-OCRv3_rec_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: false infer_img: ./doc/imgs_words/arabic/persain.tif character_dict_path: ppocr/utils/dict/arabic_dict.txt max_text_length: &max_text_length 150 infer_mode: false use_space_char: true distributed: true save_res_path: /content/drive/MyDrive/output/rec/predicts_ppocrv3_arabic.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05

Architecture: model_type: rec algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list:

Loss: name: MultiLoss loss_config_list:

PostProcess:
name: CTCLabelDecode

Metric: name: RecMetric main_indicator: acc ignore_space: False

Train: dataset: name: SimpleDataSet data_dir: /content/PaddleOCR/train_data/en_train_filtered ext_op_transform_idx: 1 label_file_list:

ftmasadi commented 1 year ago

And I use this paddel version in colab !python -m pip install paddlepaddle-gpu==2.4.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html And I use this Arabic model for fine tuning !wget https://paddleocr.bj.bcebos.com/PP-OCRv3/multilingual/arabic_PP-OCRv3_rec_train.tar And this label is one of my dataset examples: به‌منظور افزایش ظرفیتشان برای استفاده از اندازه‌گیری می‌کنند. همچنین میزان بهره‌برداری آن‌ها از در فعالیت‌های

ftmasadi commented 1 year ago

`[2023/01/19 11:22:41] ppocr INFO: Architecture : [2023/01/19 11:22:41] ppocr INFO: Backbone : [2023/01/19 11:22:41] ppocr INFO: last_conv_stride : [1, 2] [2023/01/19 11:22:41] ppocr INFO: last_pool_type : avg [2023/01/19 11:22:41] ppocr INFO: name : MobileNetV1Enhance [2023/01/19 11:22:41] ppocr INFO: scale : 0.5 [2023/01/19 11:22:41] ppocr INFO: Head : [2023/01/19 11:22:41] ppocr INFO: head_list : [2023/01/19 11:22:41] ppocr INFO: CTCHead : [2023/01/19 11:22:41] ppocr INFO: Head : [2023/01/19 11:22:41] ppocr INFO: fc_decay : 1e-05 [2023/01/19 11:22:41] ppocr INFO: Neck : [2023/01/19 11:22:41] ppocr INFO: depth : 2 [2023/01/19 11:22:41] ppocr INFO: dims : 64 [2023/01/19 11:22:41] ppocr INFO: hidden_dims : 120 [2023/01/19 11:22:41] ppocr INFO: name : svtr [2023/01/19 11:22:41] ppocr INFO: use_guide : True [2023/01/19 11:22:41] ppocr INFO: SARHead : [2023/01/19 11:22:41] ppocr INFO: enc_dim : 512 [2023/01/19 11:22:41] ppocr INFO: max_text_length : 150 [2023/01/19 11:22:41] ppocr INFO: name : MultiHead [2023/01/19 11:22:41] ppocr INFO: Transform : None [2023/01/19 11:22:41] ppocr INFO: algorithm : SVTR [2023/01/19 11:22:41] ppocr INFO: model_type : rec [2023/01/19 11:22:41] ppocr INFO: Eval : [2023/01/19 11:22:41] ppocr INFO: dataset : [2023/01/19 11:22:41] ppocr INFO: data_dir : /content/PaddleOCR/train_data/en_val [2023/01/19 11:22:41] ppocr INFO: label_file_list : ['/content/PaddleOCR/train_data/val.txt'] [2023/01/19 11:22:41] ppocr INFO: name : SimpleDataSet [2023/01/19 11:22:41] ppocr INFO: transforms : [2023/01/19 11:22:41] ppocr INFO: DecodeImage : [2023/01/19 11:22:41] ppocr INFO: channel_first : False [2023/01/19 11:22:41] ppocr INFO: img_mode : BGR [2023/01/19 11:22:41] ppocr INFO: MultiLabelEncode : None [2023/01/19 11:22:41] ppocr INFO: RecResizeImg : [2023/01/19 11:22:41] ppocr INFO: image_shape : [3, 32, 640] [2023/01/19 11:22:41] ppocr INFO: KeepKeys : [2023/01/19 11:22:41] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2023/01/19 11:22:41] ppocr INFO: loader : [2023/01/19 11:22:41] ppocr INFO: batch_size_per_card : 32 [2023/01/19 11:22:41] ppocr INFO: drop_last : False [2023/01/19 11:22:41] ppocr INFO: num_workers : 8 [2023/01/19 11:22:41] ppocr INFO: shuffle : False [2023/01/19 11:22:41] ppocr INFO: Global : [2023/01/19 11:22:41] ppocr INFO: cal_metric_during_train : True [2023/01/19 11:22:41] ppocr INFO: character_dict_path : ppocr/utils/dict/arabic_dict.txt [2023/01/19 11:22:41] ppocr INFO: checkpoints : None [2023/01/19 11:22:41] ppocr INFO: debug : False [2023/01/19 11:22:41] ppocr INFO: distributed : False [2023/01/19 11:22:41] ppocr INFO: epoch_num : 500 [2023/01/19 11:22:41] ppocr INFO: eval_batch_step : [0, 100] [2023/01/19 11:22:41] ppocr INFO: infer_img : ./doc/imgs_words/arabic/persain.tif [2023/01/19 11:22:41] ppocr INFO: infer_mode : False [2023/01/19 11:22:41] ppocr INFO: log_smooth_window : 20 [2023/01/19 11:22:41] ppocr INFO: max_text_length : 150 [2023/01/19 11:22:41] ppocr INFO: pretrained_model : /content/PaddleOCR/pretrained_model/arabic_PP-OCRv3_rec_train/best_accuracy [2023/01/19 11:22:41] ppocr INFO: print_batch_step : 10 [2023/01/19 11:22:41] ppocr INFO: save_epoch_step : 5 [2023/01/19 11:22:41] ppocr INFO: save_inference_dir : None [2023/01/19 11:22:41] ppocr INFO: save_model_dir : /content/drive/MyDrive/output/v3_arabic_mobile [2023/01/19 11:22:41] ppocr INFO: save_res_path : /content/drive/MyDrive/output/rec/predicts_ppocrv3_arabic.txt [2023/01/19 11:22:41] ppocr INFO: use_gpu : True [2023/01/19 11:22:41] ppocr INFO: use_space_char : True [2023/01/19 11:22:41] ppocr INFO: use_visualdl : False [2023/01/19 11:22:41] ppocr INFO: Loss : [2023/01/19 11:22:41] ppocr INFO: loss_config_list : [2023/01/19 11:22:41] ppocr INFO: CTCLoss : None [2023/01/19 11:22:41] ppocr INFO: SARLoss : None [2023/01/19 11:22:41] ppocr INFO: name : MultiLoss [2023/01/19 11:22:41] ppocr INFO: Metric : [2023/01/19 11:22:41] ppocr INFO: ignore_space : False [2023/01/19 11:22:41] ppocr INFO: main_indicator : acc [2023/01/19 11:22:41] ppocr INFO: name : RecMetric [2023/01/19 11:22:41] ppocr INFO: Optimizer : [2023/01/19 11:22:41] ppocr INFO: beta1 : 0.9 [2023/01/19 11:22:41] ppocr INFO: beta2 : 0.999 [2023/01/19 11:22:41] ppocr INFO: lr : [2023/01/19 11:22:41] ppocr INFO: learning_rate : 0.001 [2023/01/19 11:22:41] ppocr INFO: name : Cosine [2023/01/19 11:22:41] ppocr INFO: warmup_epoch : 5 [2023/01/19 11:22:41] ppocr INFO: name : Adam [2023/01/19 11:22:41] ppocr INFO: regularizer : [2023/01/19 11:22:41] ppocr INFO: factor : 3e-05 [2023/01/19 11:22:41] ppocr INFO: name : L2 [2023/01/19 11:22:41] ppocr INFO: PostProcess : [2023/01/19 11:22:41] ppocr INFO: name : CTCLabelDecode [2023/01/19 11:22:41] ppocr INFO: Train : [2023/01/19 11:22:41] ppocr INFO: dataset : [2023/01/19 11:22:41] ppocr INFO: data_dir : /content/PaddleOCR/train_data/en_train_filtered [2023/01/19 11:22:41] ppocr INFO: ext_op_transform_idx : 1 [2023/01/19 11:22:41] ppocr INFO: label_file_list : ['/content/PaddleOCR/train_data/train.txt'] [2023/01/19 11:22:41] ppocr INFO: name : SimpleDataSet [2023/01/19 11:22:41] ppocr INFO: transforms : [2023/01/19 11:22:41] ppocr INFO: DecodeImage : [2023/01/19 11:22:41] ppocr INFO: channel_first : False [2023/01/19 11:22:41] ppocr INFO: img_mode : BGR [2023/01/19 11:22:41] ppocr INFO: RecConAug : [2023/01/19 11:22:41] ppocr INFO: ext_data_num : 2 [2023/01/19 11:22:41] ppocr INFO: image_shape : [32, 640, 3] [2023/01/19 11:22:41] ppocr INFO: prob : 0.5 [2023/01/19 11:22:41] ppocr INFO: RecAug : None [2023/01/19 11:22:41] ppocr INFO: MultiLabelEncode : None [2023/01/19 11:22:41] ppocr INFO: RecResizeImg : [2023/01/19 11:22:41] ppocr INFO: image_shape : [3, 32, 640] [2023/01/19 11:22:41] ppocr INFO: KeepKeys : [2023/01/19 11:22:41] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2023/01/19 11:22:41] ppocr INFO: loader : [2023/01/19 11:22:41] ppocr INFO: batch_size_per_card : 32 [2023/01/19 11:22:41] ppocr INFO: drop_last : True [2023/01/19 11:22:41] ppocr INFO: num_workers : 8 [2023/01/19 11:22:41] ppocr INFO: shuffle : True [2023/01/19 11:22:41] ppocr INFO: profiler_options : None [2023/01/19 11:22:41] ppocr INFO: train with paddle 2.4.1 and device Place(gpu:0) [2023/01/19 11:22:41] ppocr INFO: Initialize indexs of datasets:['/content/PaddleOCR/train_data/train.txt'] [2023/01/19 11:22:41] ppocr INFO: Initialize indexs of datasets:['/content/PaddleOCR/train_data/val.txt'] W0119 11:22:41.624083 3129 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.2, Runtime API Version: 11.2 W0119 11:22:41.758491 3129 gpu_resources.cc:91] device: 0, cuDNN Version: 8.1. [2023/01/19 11:22:43] ppocr INFO: train dataloader has 659 iters [2023/01/19 11:22:43] ppocr INFO: valid dataloader has 142 iters [2023/01/19 11:22:43] ppocr INFO: load pretrain successful from /content/PaddleOCR/pretrained_model/arabic_PP-OCRv3_rec_train/best_accuracy [2023/01/19 11:22:43] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 100 iterations W0119 11:22:49.537029 3129 gpu_resources.cc:217] WARNING: device: �. The installed Paddle is compiled with CUDNN 8.2, but CUDNN version in your machine is 8.1, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. [2023/01/19 11:23:07] ppocr INFO: epoch: [1/500], global_step: 10, lr: 0.000001, acc: 0.000000, norm_edit_dis: 0.112137, CTCLoss: 781.074524, SARLoss: 5.399431, loss: 786.574036, avg_reader_cost: 0.53406 s, avg_batch_cost: 2.35679 s, avg_samples: 32.0, ips: 13.57778 samples/s, eta: 8 days, 23:42:19 [2023/01/19 11:23:13] ppocr INFO: epoch: [1/500], global_step: 20, lr: 0.000003, acc: 0.000000, norm_edit_dis: 0.112137, CTCLoss: 769.405762, SARLoss: 5.288942, loss: 774.606628, avg_reader_cost: 0.00943 s, avg_batch_cost: 0.67481 s, avg_samples: 32.0, ips: 47.42061 samples/s, eta: 5 days, 18:43:46 [2023/01/19 11:23:20] ppocr INFO: epoch: [1/500], global_step: 30, lr: 0.000006, acc: 0.000000, norm_edit_dis: 0.109294, CTCLoss: 770.254456, SARLoss: 5.029316, loss: 775.096802, avg_reader_cost: 0.00576 s, avg_batch_cost: 0.66767 s, avg_samples: 32.0, ips: 47.92755 samples/s, eta: 4 days, 16:51:07 [2023/01/19 11:23:27] ppocr INFO: epoch: [1/500], global_step: 40, lr: 0.000009, acc: 0.000000, norm_edit_dis: 0.107213, CTCLoss: 758.107544, SARLoss: 4.726814, loss: 762.770569, avg_reader_cost: 0.00717 s, avg_batch_cost: 0.64601 s, avg_samples: 32.0, ips: 49.53471 samples/s, eta: 4 days, 3:24:59 [2023/01/19 11:23:33] ppocr INFO: epoch: [1/500], global_step: 50, lr: 0.000012, acc: 0.000000, norm_edit_dis: 0.106015, CTCLoss: 741.015198, SARLoss: 4.438568, loss: 745.304749, avg_reader_cost: 0.00619 s, avg_batch_cost: 0.65167 s, avg_samples: 32.0, ips: 49.10484 samples/s, eta: 3 days, 19:27:29 [2023/01/19 11:23:40] ppocr INFO: epoch: [1/500], global_step: 60, lr: 0.000015, acc: 0.000000, norm_edit_dis: 0.105461, CTCLoss: 727.867371, SARLoss: 4.168729, loss: 731.865784, avg_reader_cost: 0.00796 s, avg_batch_cost: 0.69188 s, avg_samples: 32.0, ips: 46.25092 samples/s, eta: 3 days, 14:45:55 [2023/01/19 11:23:47] ppocr INFO: epoch: [1/500], global_step: 70, lr: 0.000018, acc: 0.000000, norm_edit_dis: 0.099947, CTCLoss: 723.598083, SARLoss: 3.978780, loss: 727.624695, avg_reader_cost: 0.00635 s, avg_batch_cost: 0.69077 s, avg_samples: 32.0, ips: 46.32496 samples/s, eta: 3 days, 11:23:53 [2023/01/19 11:23:54] ppocr INFO: epoch: [1/500], global_step: 80, lr: 0.000021, acc: 0.000000, norm_edit_dis: 0.093370, CTCLoss: 750.481934, SARLoss: 3.829280, loss: 754.368591, avg_reader_cost: 0.00577 s, avg_batch_cost: 0.71923 s, avg_samples: 32.0, ips: 44.49192 samples/s, eta: 3 days, 9:11:52 [2023/01/19 11:24:02] ppocr INFO: epoch: [1/500], global_step: 90, lr: 0.000024, acc: 0.000000, norm_edit_dis: 0.086637, CTCLoss: 724.074768, SARLoss: 3.682808, loss: 727.771973, avg_reader_cost: 0.00675 s, avg_batch_cost: 0.73920 s, avg_samples: 32.0, ips: 43.28978 samples/s, eta: 3 days, 7:41:21 [2023/01/19 11:24:08] ppocr INFO: epoch: [1/500], global_step: 100, lr: 0.000027, acc: 0.000000, norm_edit_dis: 0.080045, CTCLoss: 708.865173, SARLoss: 3.474693, loss: 712.314697, avg_reader_cost: 0.00632 s, avg_batch_cost: 0.67492 s, avg_samples: 32.0, ips: 47.41319 samples/s, eta: 3 days, 5:53:36 eval model:: 87% 124/142 [00:09<00:01, 16.15it/s][2023/01/19 11:24:18] ppocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in getitem label = substr[1] IndexError: list index out of range

eval model:: 100% 142/142 [00:10<00:00, 13.24it/s] [2023/01/19 11:24:19] ppocr INFO: cur metric, acc: 0.0, norm_edit_dis: 0.07519675472355747, fps: 563.9652171182029 [2023/01/19 11:24:20] ppocr INFO: save best model is to /content/drive/MyDrive/output/v3_arabic_mobile/best_accuracy [2023/01/19 11:24:20] ppocr INFO: best metric, acc: 0.0, is_float16: False, norm_edit_dis: 0.07519675472355747, fps: 563.9652171182029, best_epoch: 1 [2023/01/19 11:24:26] ppocr INFO: epoch: [1/500], global_step: 110, lr: 0.000030, acc: 0.000000, norm_edit_dis: 0.077594, CTCLoss: 704.225830, SARLoss: 3.250280, loss: 707.491455, avg_reader_cost: 0.00545 s, avg_batch_cost: 0.57627 s, avg_samples: 32.0, ips: 55.52923 samples/s, eta: 3 days, 3:36:13 [2023/01/19 11:24:35] ppocr INFO: epoch: [1/500], global_step: 120, lr: 0.000033, acc: 0.000000, norm_edit_dis: 0.074352, CTCLoss: 685.737671, SARLoss: 3.156804, loss: 688.937744, avg_reader_cost: 0.00790 s, avg_batch_cost: 0.92085 s, avg_samples: 32.0, ips: 34.75065 samples/s, eta: 3 days, 4:19:20 [2023/01/19 11:24:43] ppocr INFO: epoch: [1/500], global_step: 130, lr: 0.000036, acc: 0.000000, norm_edit_dis: 0.061596, CTCLoss: 642.892822, SARLoss: 3.102704, loss: 645.995544, avg_reader_cost: 0.00540 s, avg_batch_cost: 0.75985 s, avg_samples: 32.0, ips: 42.11363 samples/s, eta: 3 days, 3:47:48 [2023/01/19 11:24:49] ppocr INFO: epoch: [1/500], global_step: 140, lr: 0.000039, acc: 0.000000, norm_edit_dis: 0.053753, CTCLoss: 590.261963, SARLoss: 3.096110, loss: 593.361084, avg_reader_cost: 0.00963 s, avg_batch_cost: 0.66898 s, avg_samples: 32.0, ips: 47.83369 samples/s, eta: 3 days, 2:45:08 [2023/01/19 11:24:56] ppocr INFO: epoch: [1/500], global_step: 150, lr: 0.000042, acc: 0.000000, norm_edit_dis: 0.045232, CTCLoss: 599.001160, SARLoss: 3.058224, loss: 602.068542, avg_reader_cost: 0.00648 s, avg_batch_cost: 0.70644 s, avg_samples: 32.0, ips: 45.29765 samples/s, eta: 3 days, 2:04:31 [2023/01/19 11:25:03] ppocr INFO: epoch: [1/500], global_step: 160, lr: 0.000045, acc: 0.000000, norm_edit_dis: 0.039496, CTCLoss: 595.768860, SARLoss: 3.053205, loss: 598.826233, avg_reader_cost: 0.00472 s, avg_batch_cost: 0.62627 s, avg_samples: 32.0, ips: 51.09647 samples/s, eta: 3 days, 1:01:27 [2023/01/19 11:25:09] ppocr INFO: epoch: [1/500], global_step: 170, lr: 0.000048, acc: 0.000000, norm_edit_dis: 0.032524, CTCLoss: 582.577820, SARLoss: 3.032071, loss: 585.635742, avg_reader_cost: 0.00904 s, avg_batch_cost: 0.63157 s, avg_samples: 32.0, ips: 50.66725 samples/s, eta: 3 days, 0:07:31 [2023/01/19 11:25:16] ppocr INFO: epoch: [1/500], global_step: 180, lr: 0.000051, acc: 0.000000, norm_edit_dis: 0.027643, CTCLoss: 564.732056, SARLoss: 3.021118, loss: 567.747070, avg_reader_cost: 0.01129 s, avg_batch_cost: 0.67902 s, avg_samples: 32.0, ips: 47.12658 samples/s, eta: 2 days, 23:34:01 [2023/01/19 11:25:22] ppocr INFO: epoch: [1/500], global_step: 190, lr: 0.000054, acc: 0.000000, norm_edit_dis: 0.026539, CTCLoss: 513.338806, SARLoss: 3.013977, loss: 516.352661, avg_reader_cost: 0.00736 s, avg_batch_cost: 0.63804 s, avg_samples: 32.0, ips: 50.15361 samples/s, eta: 2 days, 22:52:12 [2023/01/19 11:25:25] ppocr ERROR: When parsing line 10056.tif منظر حرفه‌ای یک خبرگزاری است. سخنگوی دولت در ادامه با بیان اینکه من وقتی اخبار , error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 136, in getitem data['ext_data'] = self.get_ext_data() File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 102, in get_ext_data label = substr[1] IndexError: list index out of range

[2023/01/19 11:25:29] ppocr INFO: epoch: [1/500], global_step: 200, lr: 0.000058, acc: 0.000000, norm_edit_dis: 0.023819, CTCLoss: 510.242798, SARLoss: 3.011428, loss: 513.249817, avg_reader_cost: 0.00722 s, avg_batch_cost: 0.67172 s, avg_samples: 32.0, ips: 47.63895 samples/s, eta: 2 days, 22:23:48 eval model:: 87% 124/142 [00:09<00:01, 16.40it/s][2023/01/19 11:25:39] ppocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in getitem label = substr[1] IndexError: list index out of range

eval model:: 100% 142/142 [00:11<00:00, 12.69it/s] [2023/01/19 11:25:40] ppocr INFO: cur metric, acc: 0.0, norm_edit_dis: 0.030784852175908206, fps: 569.8444893235869 [2023/01/19 11:25:41] ppocr INFO: save best model is to /content/drive/MyDrive/output/v3_arabic_mobile/best_accuracy [2023/01/19 11:25:41] ppocr INFO: best metric, acc: 0.0, is_float16: False, norm_edit_dis: 0.030784852175908206, fps: 569.8444893235869, best_epoch: 1 [2023/01/19 11:25:46] ppocr INFO: epoch: [1/500], global_step: 210, lr: 0.000061, acc: 0.000000, norm_edit_dis: 0.021640, CTCLoss: 513.406860, SARLoss: 3.008324, loss: 516.390259, avg_reader_cost: 0.00602 s, avg_batch_cost: 0.54909 s, avg_samples: 32.0, ips: 58.27822 samples/s, eta: 2 days, 21:26:03 [2023/01/19 11:25:53] ppocr INFO: epoch: [1/500], global_step: 220, lr: 0.000064, acc: 0.000000, norm_edit_dis: 0.020042, CTCLoss: 490.791016, SARLoss: 3.002353, loss: 493.787109, avg_reader_cost: 0.00591 s, avg_batch_cost: 0.70203 s, avg_samples: 32.0, ips: 45.58196 samples/s, eta: 2 days, 21:11:41 [2023/01/19 11:26:00] ppocr INFO: epoch: [1/500], global_step: 230, lr: 0.000067, acc: 0.000000, norm_edit_dis: 0.019467, CTCLoss: 457.904602, SARLoss: 2.987553, loss: 460.890442, avg_reader_cost: 0.00747 s, avg_batch_cost: 0.67949 s, avg_samples: 32.0, ips: 47.09380 samples/s, eta: 2 days, 20:53:11 [2023/01/19 11:26:07] ppocr INFO: epoch: [1/500], global_step: 240, lr: 0.000070, acc: 0.000000, norm_edit_dis: 0.020930, CTCLoss: 454.403015, SARLoss: 2.982190, loss: 457.384369, avg_reader_cost: 0.00810 s, avg_batch_cost: 0.67008 s, avg_samples: 32.0, ips: 47.75564 samples/s, eta: 2 days, 20:34:04 [2023/01/19 11:26:11] ppocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in getitem label = substr[1] IndexError: list index out of range

[2023/01/19 11:26:13] ppocr INFO: epoch: [1/500], global_step: 250, lr: 0.000073, acc: 0.000000, norm_edit_dis: 0.020508, CTCLoss: 424.645050, SARLoss: 2.969497, loss: 427.637390, avg_reader_cost: 0.00495 s, avg_batch_cost: 0.64720 s, avg_samples: 32.0, ips: 49.44389 samples/s, eta: 2 days, 20:11:27 [2023/01/19 11:26:20] ppocr INFO: epoch: [1/500], global_step: 260, lr: 0.000076, acc: 0.000000, norm_edit_dis: 0.017172, CTCLoss: 413.317383, SARLoss: 2.961418, loss: 416.277161, avg_reader_cost: 0.00783 s, avg_batch_cost: 0.69570 s, avg_samples: 32.0, ips: 45.99697 samples/s, eta: 2 days, 20:00:47 [2023/01/19 11:26:27] ppocr INFO: epoch: [1/500], global_step: 270, lr: 0.000079, acc: 0.000000, norm_edit_dis: 0.014296, CTCLoss: 414.957153, SARLoss: 2.959824, loss: 417.909973, avg_reader_cost: 0.00846 s, avg_batch_cost: 0.63115 s, avg_samples: 32.0, ips: 50.70145 samples/s, eta: 2 days, 19:37:48 [2023/01/19 11:26:33] ppocr INFO: epoch: [1/500], global_step: 280, lr: 0.000082, acc: 0.000000, norm_edit_dis: 0.011666, CTCLoss: 412.579529, SARLoss: 2.956809, loss: 415.537964, avg_reader_cost: 0.00678 s, avg_batch_cost: 0.65424 s, avg_samples: 32.0, ips: 48.91135 samples/s, eta: 2 days, 19:20:58 [2023/01/19 11:26:40] ppocr INFO: epoch: [1/500], global_step: 290, lr: 0.000085, acc: 0.000000, norm_edit_dis: 0.010557, CTCLoss: 399.227844, SARLoss: 2.958164, loss: 402.171265, avg_reader_cost: 0.00496 s, avg_batch_cost: 0.64382 s, avg_samples: 32.0, ips: 49.70349 samples/s, eta: 2 days, 19:03:19 [2023/01/19 11:26:47] ppocr INFO: epoch: [1/500], global_step: 300, lr: 0.000088, acc: 0.000000, norm_edit_dis: 0.009290, CTCLoss: 402.461365, SARLoss: 2.948498, loss: 405.394653, avg_reader_cost: 0.00634 s, avg_batch_cost: 0.70802 s, avg_samples: 32.0, ips: 45.19637 samples/s, eta: 2 days, 18:58:34 eval model:: 87% 124/142 [00:09<00:01, 15.77it/s][2023/01/19 11:26:56] ppocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in getitem label = substr[1] IndexError: list index out of range

eval model:: 100% 142/142 [00:10<00:00, 13.29it/s] [2023/01/19 11:26:57] ppocr INFO: cur metric, acc: 0.0, norm_edit_dis: 0.007356055752568702, fps: 570.3502823200225 [2023/01/19 11:26:58] ppocr INFO: save best model is to /content/drive/MyDrive/output/v3_arabic_mobile/best_accuracy [2023/01/19 11:26:58] ppocr INFO: best metric, acc: 0.0, is_float16: False, norm_edit_dis: 0.007356055752568702, fps: 570.3502823200225, best_epoch: 1 [2023/01/19 11:27:04] ppocr INFO: epoch: [1/500], global_step: 310, lr: 0.000091, acc: 0.000000, norm_edit_dis: 0.008120, CTCLoss: 402.461365, SARLoss: 2.939734, loss: 405.394653, avg_reader_cost: 0.00589 s, avg_batch_cost: 0.56909 s, avg_samples: 32.0, ips: 56.23035 samples/s, eta: 2 days, 18:29:32 [2023/01/19 11:27:11] ppocr INFO: epoch: [1/500], global_step: 320, lr: 0.000094, acc: 0.000000, norm_edit_dis: 0.005634, CTCLoss: 390.016479, SARLoss: 2.937482, loss: 392.949890, avg_reader_cost: 0.00591 s, avg_batch_cost: 0.68375 s, avg_samples: 32.0, ips: 46.80106 samples/s, eta: 2 days, 18:21:59 [2023/01/19 11:27:18] ppocr INFO: epoch: [1/500], global_step: 330, lr: 0.000097, acc: 0.000000, norm_edit_dis: 0.004956, CTCLoss: 355.427521, SARLoss: 2.937126, loss: 358.325073, avg_reader_cost: 0.00730 s, avg_batch_cost: 0.70580 s, avg_samples: 32.0, ips: 45.33872 samples/s, eta: 2 days, 18:18:32 [2023/01/19 11:27:24] ppocr INFO: epoch: [1/500], global_step: 340, lr: 0.000100, acc: 0.000000, norm_edit_dis: 0.004348, CTCLoss: 346.128723, SARLoss: 2.938118, loss: 349.079468, avg_reader_cost: 0.00510 s, avg_batch_cost: 0.63719 s, avg_samples: 32.0, ips: 50.22039 samples/s, eta: 2 days, 18:04:13 [2023/01/19 11:27:31] ppocr INFO: epoch: [1/500], global_step: 350, lr: 0.000103, acc: 0.000000, norm_edit_dis: 0.004242, CTCLoss: 352.615723, SARLoss: 2.921678, loss: 355.554138, avg_reader_cost: 0.00757 s, avg_batch_cost: 0.69537 s, avg_samples: 32.0, ips: 46.01837 samples/s, eta: 2 days, 17:59:49 [2023/01/19 11:27:38] ppocr INFO: epoch: [1/500], global_step: 360, lr: 0.000106, acc: 0.000000, norm_edit_dis: 0.005669, CTCLoss: 336.840576, SARLoss: 2.908116, loss: 339.725220, avg_reader_cost: 0.00399 s, avg_batch_cost: 0.68264 s, avg_samples: 32.0, ips: 46.87653 samples/s, eta: 2 days, 17:53:44 [2023/01/19 11:27:45] ppocr INFO: epoch: [1/500], global_step: 370, lr: 0.000109, acc: 0.000000, norm_edit_dis: 0.005668, CTCLoss: 336.840576, SARLoss: 2.897107, loss: 339.725220, avg_reader_cost: 0.01002 s, avg_batch_cost: 0.72184 s, avg_samples: 32.0, ips: 44.33095 samples/s, eta: 2 days, 17:53:46 [2023/01/19 11:27:55] ppocr INFO: epoch: [1/500], global_step: 380, lr: 0.000112, acc: 0.000000, norm_edit_dis: 0.005518, CTCLoss: 349.543488, SARLoss: 2.884204, loss: 352.405090, avg_reader_cost: 0.00880 s, avg_batch_cost: 1.02064 s, avg_samples: 32.0, ips: 31.35288 samples/s, eta: 2 days, 18:36:56 [2023/01/19 11:28:02] ppocr INFO: epoch: [1/500], global_step: 390, lr: 0.000115, acc: 0.000000, norm_edit_dis: 0.005934, CTCLoss: 342.488892, SARLoss: 2.879702, loss: 345.354828, avg_reader_cost: 0.00511 s, avg_batch_cost: 0.67917 s, avg_samples: 32.0, ips: 47.11642 samples/s, eta: 2 days, 18:29:51 [2023/01/19 11:28:09] ppocr INFO: epoch: [1/500], global_step: 400, lr: 0.000118, acc: 0.000000, norm_edit_dis: 0.006060, CTCLoss: 326.790253, SARLoss: 2.865736, loss: 329.666290, avg_reader_cost: 0.00758 s, avg_batch_cost: 0.64419 s, avg_samples: 32.0, ips: 49.67516 samples/s, eta: 2 days, 18:18:20 eval model:: 87% 124/142 [00:09<00:01, 16.14it/s][2023/01/19 11:28:18] ppocr ERROR: When parsing line Evaluation Only. Created with Aspose.Cells for Python via Java.Copyright 2003 - 2023 Aspose Pty Ltd., error happened with msg: Traceback (most recent call last): File "/content/PaddleOCR/ppocr/data/simple_dataset.py", line 128, in getitem label = substr[1] IndexError: list index out of range

`

ftmasadi commented 1 year ago

@ WenmuZhou

TasneemVKhan commented 1 year ago

I'm getting the same error, please help

ghost commented 1 year ago

This works for me

Image file name and annotation must be separated by the \t character Lines must be separated by the \n character

Example training/image_00000053.jpg \t [{"transcription":0,"points":[[221,219],[412,219],[412,289],[221,289]]}] \n training/image_00000053.jpg \t [{"transcription":0,"points":[[221,219],[412,219],[412,289],[221,289]]}] \n

mansi2606 commented 1 year ago

There was empty lines in my traint.txt which was throwing this error. You can delete any empty lines from your txt files.

xiaozhou0311 commented 7 months ago

[2023/12/05 15:58:17] ppocr INFO: train with paddle 2.5.2 and device Place(cpu) [2023/12/05 15:58:17] ppocr INFO: Initialize indexs of datasets:['./train_data/det/train.txt'] [2023/12/05 15:58:17] ppocr INFO: Initialize indexs of datasets:['./train_data/det/val.txt'] [2023/12/05 15:58:17] ppocr INFO: train dataloader has 24 iters [2023/12/05 15:58:17] ppocr INFO: valid dataloader has 9 iters [2023/12/05 15:58:17] ppocr INFO: load pretrain successful from pretrain_models/en_PP-OCRv3_det_distill_train/student [2023/12/05 15:58:17] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 3 iterations [2023/12/05 15:58:33] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range

[2023/12/05 15:58:37] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2720649282331695 [2023/12/05 15:58:37] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy [2023/12/05 15:58:37] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2720649282331695, best_epoch: 1 [2023/12/05 15:58:53] ppocr ERROR: When parsing line

, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range

[2023/12/05 15:58:57] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2869010204678446 [2023/12/05 15:58:57] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy [2023/12/05 15:58:57] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2869010204678446, best_epoch: 1 [2023/12/05 15:59:13] ppocr ERROR: When parsing line

加载数据没有出现List index out of range 但是训练的时候出现