Closed alanxinn closed 1 year ago
請問我也遇到相同錯誤,
data['ext_data'] = self.get_ext_data()
File "D:\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 124, in get_ext_data
label = substr[1]
IndexError: list index out of range
請問後續是如何解決的?
請問我也遇到相同錯誤,
data['ext_data'] = self.get_ext_data() File "D:\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 124, in get_ext_data label = substr[1] IndexError: list index out of range
請問後續是如何解決的?
我忘记了,好像就是因为数据集的问题导致的
@alanxinn 找到問題的解決方法了,PPOCRLabel標註完後使用gen_ocr_train_val_test.py製作train.txt, val.txt和test.txt。 發現這三個txt檔案多了一個換行導致讀取錯誤,如下
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_76.jpg 192.168.122.255
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_90.jpg overruns
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_31.jpg flags=73<UP,LOOPBACK,RUNNING>
我嘗試將空白的行去掉後便能正常執行,更改為
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_76.jpg 192.168.122.255
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_90.jpg overruns
D:\PaddleOCR\train_data\rec\train\FAB01_Terminal_RedHat7.9_crop_31.jpg flags=73<UP,LOOPBACK,RUNNING>
不過說也奇怪,我只有train.txt報錯,也只修改了train.txt,其餘兩個沒有修改卻能正常訓練,不曉得什麼原因。
023/12/05 15:58:17] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 3 iterations [2023/12/05 15:58:33] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range
[2023/12/05 15:58:37] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2720649282331695 [2023/12/05 15:58:37] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy [2023/12/05 15:58:37] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2720649282331695, best_epoch: 1 [2023/12/05 15:58:53] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range
[2023/12/05 15:58:57] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2869010204678446 为什么加载数据的时候没有出现 label = substr[1],训练的时候就出现了
023/12/05 15:58:17] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 3 iterations [2023/12/05 15:58:33] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range
[2023/12/05 15:58:37] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2720649282331695 [2023/12/05 15:58:37] ppocr INFO: save best model is to ./output/ch_PP-OCR_V3_det/best_accuracy [2023/12/05 15:58:37] ppocr INFO: best metric, hmean: 0.9411764705882353, is_float16: False, precision: 0.8888888888888888, recall: 1.0, fps: 1.2720649282331695, best_epoch: 1 [2023/12/05 15:58:53] ppocr ERROR: When parsing line
, error happened with msg: Traceback (most recent call last): File "E:\AI_Code\PaddleOCR-2.7.1\ppocr\data\simple_dataset.py", line 150, in getitem label = substr[1] IndexError: list index out of range
[2023/12/05 15:58:57] ppocr INFO: cur metric, precision: 0.8888888888888888, recall: 1.0, hmean: 0.9411764705882353, fps: 1.2869010204678446 为什么加载数据的时候没有出现 label = substr[1],训练的时候就出现了
没有遇到过 不好意思 帮不到你
使用这个脚本检测一下是否是\t分割就可以了def check_and_fix_tab_separation(file_path): with open(file_path, 'r', encoding='utf-8') as file: lines = file.readlines()
new_lines = []
for line in lines:
if '\t' not in line:
# 如果没有找到制表符,则将空格替换为制表符
line = line.replace(' ', '\t')
new_lines.append(line)
with open(file_path, 'w', encoding='utf-8') as file:
file.writelines(new_lines)
我也遇到了,你们怎么解决的发一下
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
[2023/09/11 19:49:04] ppocr INFO: Architecture : [2023/09/11 19:49:04] ppocr INFO: Backbone : [2023/09/11 19:49:04] ppocr INFO: name : PPLCNetV3 [2023/09/11 19:49:04] ppocr INFO: scale : 0.95 [2023/09/11 19:49:04] ppocr INFO: Head : [2023/09/11 19:49:04] ppocr INFO: head_list : [2023/09/11 19:49:04] ppocr INFO: CTCHead : [2023/09/11 19:49:04] ppocr INFO: Head : [2023/09/11 19:49:04] ppocr INFO: fc_decay : 1e-05 [2023/09/11 19:49:04] ppocr INFO: Neck : [2023/09/11 19:49:04] ppocr INFO: depth : 2 [2023/09/11 19:49:04] ppocr INFO: dims : 120 [2023/09/11 19:49:04] ppocr INFO: hidden_dims : 120 [2023/09/11 19:49:04] ppocr INFO: kernel_size : [1, 3] [2023/09/11 19:49:04] ppocr INFO: name : svtr [2023/09/11 19:49:04] ppocr INFO: use_guide : True [2023/09/11 19:49:04] ppocr INFO: NRTRHead : [2023/09/11 19:49:04] ppocr INFO: max_text_length : 25 [2023/09/11 19:49:04] ppocr INFO: nrtr_dim : 384 [2023/09/11 19:49:04] ppocr INFO: name : MultiHead [2023/09/11 19:49:04] ppocr INFO: Transform : None [2023/09/11 19:49:04] ppocr INFO: algorithm : SVTR_LCNet [2023/09/11 19:49:04] ppocr INFO: model_type : rec [2023/09/11 19:49:04] ppocr INFO: Eval : [2023/09/11 19:49:04] ppocr INFO: dataset : [2023/09/11 19:49:04] ppocr INFO: data_dir : datasets\ [2023/09/11 19:49:04] ppocr INFO: label_file_list : ['datasets\rec_gt_test_change.txt'] [2023/09/11 19:49:04] ppocr INFO: name : SimpleDataSet [2023/09/11 19:49:04] ppocr INFO: transforms : [2023/09/11 19:49:04] ppocr INFO: DecodeImage : [2023/09/11 19:49:04] ppocr INFO: channel_first : False [2023/09/11 19:49:04] ppocr INFO: img_mode : BGR [2023/09/11 19:49:04] ppocr INFO: MultiLabelEncode : [2023/09/11 19:49:04] ppocr INFO: gtc_encode : NRTRLabelEncode [2023/09/11 19:49:04] ppocr INFO: RecResizeImg : [2023/09/11 19:49:04] ppocr INFO: image_shape : [3, 48, 320] [2023/09/11 19:49:04] ppocr INFO: KeepKeys : [2023/09/11 19:49:04] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio'] [2023/09/11 19:49:04] ppocr INFO: loader : [2023/09/11 19:49:04] ppocr INFO: batch_size_per_card : 64 [2023/09/11 19:49:04] ppocr INFO: drop_last : False [2023/09/11 19:49:04] ppocr INFO: num_workers : 8 [2023/09/11 19:49:04] ppocr INFO: shuffle : False [2023/09/11 19:49:04] ppocr INFO: Global : [2023/09/11 19:49:04] ppocr INFO: cal_metric_during_train : True [2023/09/11 19:49:04] ppocr INFO: character_dict_path : ppocr\utils\en_dict.txt [2023/09/11 19:49:04] ppocr INFO: checkpoints : None [2023/09/11 19:49:04] ppocr INFO: debug : False [2023/09/11 19:49:04] ppocr INFO: distributed : False [2023/09/11 19:49:04] ppocr INFO: epoch_num : 100 [2023/09/11 19:49:04] ppocr INFO: eval_batch_step : [0, 1500] [2023/09/11 19:49:04] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2023/09/11 19:49:04] ppocr INFO: infer_mode : False [2023/09/11 19:49:04] ppocr INFO: log_smooth_window : 20 [2023/09/11 19:49:04] ppocr INFO: max_text_length : 25 [2023/09/11 19:49:04] ppocr INFO: pretrained_model : datasets\en_PP-OCRv4_rec_train\best_accuracy [2023/09/11 19:49:04] ppocr INFO: print_batch_step : 10 [2023/09/11 19:49:04] ppocr INFO: save_epoch_step : 5 [2023/09/11 19:49:04] ppocr INFO: save_inference_dir : None [2023/09/11 19:49:04] ppocr INFO: save_model_dir : ./output/rec_ppocr_v4_zifu_en_epoch100 [2023/09/11 19:49:04] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv4.txt [2023/09/11 19:49:04] ppocr INFO: use_gpu : True [2023/09/11 19:49:04] ppocr INFO: use_space_char : True [2023/09/11 19:49:04] ppocr INFO: use_visualdl : False [2023/09/11 19:49:04] ppocr INFO: Loss : [2023/09/11 19:49:04] ppocr INFO: loss_config_list : [2023/09/11 19:49:04] ppocr INFO: CTCLoss : None [2023/09/11 19:49:04] ppocr INFO: NRTRLoss : None [2023/09/11 19:49:04] ppocr INFO: name : MultiLoss [2023/09/11 19:49:04] ppocr INFO: Metric : [2023/09/11 19:49:04] ppocr INFO: ignore_space : False [2023/09/11 19:49:04] ppocr INFO: main_indicator : acc [2023/09/11 19:49:04] ppocr INFO: name : RecMetric [2023/09/11 19:49:04] ppocr INFO: Optimizer : [2023/09/11 19:49:04] ppocr INFO: beta1 : 0.9 [2023/09/11 19:49:04] ppocr INFO: beta2 : 0.999 [2023/09/11 19:49:04] ppocr INFO: lr : [2023/09/11 19:49:04] ppocr INFO: learning_rate : 0.0005 [2023/09/11 19:49:04] ppocr INFO: name : Cosine [2023/09/11 19:49:04] ppocr INFO: warmup_epoch : 5 [2023/09/11 19:49:04] ppocr INFO: name : Adam [2023/09/11 19:49:04] ppocr INFO: regularizer : [2023/09/11 19:49:04] ppocr INFO: factor : 3e-05 [2023/09/11 19:49:04] ppocr INFO: name : L2 [2023/09/11 19:49:04] ppocr INFO: PostProcess : [2023/09/11 19:49:04] ppocr INFO: name : CTCLabelDecode [2023/09/11 19:49:04] ppocr INFO: Train : [2023/09/11 19:49:04] ppocr INFO: dataset : [2023/09/11 19:49:04] ppocr INFO: data_dir : datasets\ [2023/09/11 19:49:04] ppocr INFO: ds_width : False [2023/09/11 19:49:04] ppocr INFO: ext_op_transform_idx : 1 [2023/09/11 19:49:04] ppocr INFO: label_file_list : ['datasets\rec_gt_train_change.txt'] [2023/09/11 19:49:04] ppocr INFO: name : MultiScaleDataSet [2023/09/11 19:49:04] ppocr INFO: transforms : [2023/09/11 19:49:04] ppocr INFO: DecodeImage : [2023/09/11 19:49:04] ppocr INFO: channel_first : False [2023/09/11 19:49:04] ppocr INFO: img_mode : BGR [2023/09/11 19:49:04] ppocr INFO: RecConAug : [2023/09/11 19:49:04] ppocr INFO: ext_data_num : 2 [2023/09/11 19:49:04] ppocr INFO: image_shape : [48, 320, 3] [2023/09/11 19:49:04] ppocr INFO: max_text_length : 25 [2023/09/11 19:49:04] ppocr INFO: prob : 0.5 [2023/09/11 19:49:04] ppocr INFO: RecAug : None [2023/09/11 19:49:04] ppocr INFO: MultiLabelEncode : [2023/09/11 19:49:04] ppocr INFO: gtc_encode : NRTRLabelEncode [2023/09/11 19:49:04] ppocr INFO: KeepKeys : [2023/09/11 19:49:04] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_gtc', 'length', 'valid_ratio'] [2023/09/11 19:49:04] ppocr INFO: loader : [2023/09/11 19:49:04] ppocr INFO: batch_size_per_card : 64 [2023/09/11 19:49:04] ppocr INFO: drop_last : True [2023/09/11 19:49:04] ppocr INFO: num_workers : 8 [2023/09/11 19:49:04] ppocr INFO: shuffle : True [2023/09/11 19:49:04] ppocr INFO: sampler : [2023/09/11 19:49:04] ppocr INFO: divided_factor : [8, 16] [2023/09/11 19:49:04] ppocr INFO: first_bs : 96 [2023/09/11 19:49:04] ppocr INFO: fix_bs : False [2023/09/11 19:49:04] ppocr INFO: is_training : True [2023/09/11 19:49:04] ppocr INFO: name : MultiScaleSampler [2023/09/11 19:49:04] ppocr INFO: scales : [[320, 32], [320, 48], [320, 64]] [2023/09/11 19:49:04] ppocr INFO: profiler_options : None [2023/09/11 19:49:04] ppocr INFO: train with paddle 2.4.2 and device Place(gpu:0) [2023/09/11 19:49:04] ppocr INFO: Initialize indexs of datasets:['datasets\rec_gt_train_change.txt'] [2023/09/11 19:49:04] ppocr INFO: Initialize indexs of datasets:['datasets\rec_gt_test_change.txt'] [2023/09/11 19:49:05] ppocr INFO: train dataloader has 69 iters [2023/09/11 19:49:05] ppocr INFO: valid dataloader has 34 iters [2023/09/11 19:49:05] ppocr INFO: load pretrain successful from datasets\en_PP-OCRv4_rec_train\best_accuracy [2023/09/11 19:49:05] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 1500 iterations [2023/09/12 10:54:30] ppocr ERROR: When parsing line train/word_228.png marina , error happened with msg: Traceback (most recent call last): File "E:\desktop\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 252, in getitem data['ext_data'] = self.get_ext_data() File "E:\desktop\PaddleOCR-release-2.7\ppocr\data\simple_dataset.py", line 124, in get_ext_data label = substr[1] IndexError: list index out of range
我们提供了AceIssueSolver来帮助你解答问题,你是否想要它来解答(请填写yes/no)?/We provide AceIssueSolver to solve issues, do you want it? (Please write yes/no):
label文件已经将图片路径和图像内容使用\t进行分割了 但还是会报错