PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.57k stars 7.67k forks source link

文字识别训练,图像大小设置报错 #2090

Closed simplew2011 closed 3 years ago

simplew2011 commented 3 years ago

使用rec_chinese_lite_train_v2.0.yml训练识别模型,对车牌字符进行识别,标准车牌比例是140440(宽高比3.14)。 我基于CCPD数据集截取车牌区域后,设置RecResizeImg.image_shape: [3, 32, 320]或[3, 32, 100]时可训练,但这个尺寸和车牌相差比较远,重新设置.yml文件,按车牌比例的大小,设置为 [3, 70, 220]或[3, 64, 256]均报错。 D:\ProgramData\Anaconda3\envs\tf_1.15\python.exe C:/Users/Administrator/Desktop/PaddleOCR-release-2.0/main_train.py train dataset split (with 0.1) in train_num: 4500, val_num: 500 [2021/02/24 12:12:10] root INFO: Architecture : [2021/02/24 12:12:10] root INFO: Backbone : [2021/02/24 12:12:10] root INFO: model_name : small [2021/02/24 12:12:10] root INFO: name : MobileNetV3 [2021/02/24 12:12:10] root INFO: scale : 0.5 [2021/02/24 12:12:10] root INFO: small_stride : [1, 2, 2, 2] [2021/02/24 12:12:10] root INFO: Head : [2021/02/24 12:12:10] root INFO: fc_decay : 1e-05 [2021/02/24 12:12:10] root INFO: name : CTCHead [2021/02/24 12:12:10] root INFO: Neck : [2021/02/24 12:12:10] root INFO: encoder_type : rnn [2021/02/24 12:12:10] root INFO: hidden_size : 48 [2021/02/24 12:12:10] root INFO: name : SequenceEncoder [2021/02/24 12:12:10] root INFO: Transform : None [2021/02/24 12:12:10] root INFO: algorithm : CRNN [2021/02/24 12:12:10] root INFO: model_type : rec [2021/02/24 12:12:10] root INFO: Eval : [2021/02/24 12:12:10] root INFO: dataset : [2021/02/24 12:12:10] root INFO: data_dir : D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits\rec_dataset [2021/02/24 12:12:10] root INFO: label_file_list : ['D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits/det_gt_5k_rec_val.txt'] [2021/02/24 12:12:10] root INFO: name : SimpleDataSet [2021/02/24 12:12:10] root INFO: transforms : [2021/02/24 12:12:10] root INFO: DecodeImage : [2021/02/24 12:12:10] root INFO: channel_first : False [2021/02/24 12:12:10] root INFO: img_mode : BGR [2021/02/24 12:12:10] root INFO: CTCLabelEncode : None [2021/02/24 12:12:10] root INFO: RecResizeImg : [2021/02/24 12:12:10] root INFO: image_shape : [3, 70, 220] [2021/02/24 12:12:10] root INFO: KeepKeys : [2021/02/24 12:12:10] root INFO: keep_keys : ['image', 'label', 'length'] [2021/02/24 12:12:10] root INFO: loader : [2021/02/24 12:12:10] root INFO: batch_size_per_card : 16 [2021/02/24 12:12:10] root INFO: drop_last : False [2021/02/24 12:12:10] root INFO: num_workers : 8 [2021/02/24 12:12:10] root INFO: shuffle : False [2021/02/24 12:12:10] root INFO: Global : [2021/02/24 12:12:10] root INFO: cal_metric_during_train : True [2021/02/24 12:12:10] root INFO: character_dict_path : D:\Plate_OCR\13\CCPD2019.tar\CCPD2019\splits\plate_dict.txt [2021/02/24 12:12:10] root INFO: character_type : ch [2021/02/24 12:12:10] root INFO: checkpoints : [2021/02/24 12:12:10] root INFO: distributed : False [2021/02/24 12:12:10] root INFO: epoch_num : 100 [2021/02/24 12:12:10] root INFO: eval_batch_step : [100, 200] [2021/02/24 12:12:10] root INFO: infer_img : None [2021/02/24 12:12:10] root INFO: infer_mode : False [2021/02/24 12:12:10] root INFO: log_smooth_window : 20 [2021/02/24 12:12:10] root INFO: max_text_length : 25 [2021/02/24 12:12:10] root INFO: pretrained_model : None [2021/02/24 12:12:10] root INFO: print_batch_step : 10 [2021/02/24 12:12:10] root INFO: save_epoch_step : 3 [2021/02/24 12:12:10] root INFO: save_inference_dir : None [2021/02/24 12:12:10] root INFO: save_model_dir : ./output\rec [2021/02/24 12:12:10] root INFO: use_gpu : True [2021/02/24 12:12:10] root INFO: use_space_char : False [2021/02/24 12:12:10] root INFO: use_visualdl : False [2021/02/24 12:12:10] root INFO: Loss : [2021/02/24 12:12:10] root INFO: name : CTCLoss [2021/02/24 12:12:10] root INFO: Metric : [2021/02/24 12:12:10] root INFO: main_indicator : acc [2021/02/24 12:12:10] root INFO: name : RecMetric [2021/02/24 12:12:10] root INFO: Optimizer : [2021/02/24 12:12:10] root INFO: beta1 : 0.9 [2021/02/24 12:12:10] root INFO: beta2 : 0.999 [2021/02/24 12:12:10] root INFO: lr : [2021/02/24 12:12:10] root INFO: learning_rate : 0.001 [2021/02/24 12:12:10] root INFO: name : Cosine [2021/02/24 12:12:10] root INFO: name : Adam [2021/02/24 12:12:10] root INFO: regularizer : [2021/02/24 12:12:10] root INFO: factor : 1e-05 [2021/02/24 12:12:10] root INFO: name : L2 [2021/02/24 12:12:10] root INFO: PostProcess : [2021/02/24 12:12:10] root INFO: name : CTCLabelDecode [2021/02/24 12:12:10] root INFO: Train : [2021/02/24 12:12:10] root INFO: dataset : [2021/02/24 12:12:10] root INFO: data_dir : D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits\rec_dataset [2021/02/24 12:12:10] root INFO: label_file_list : ['D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits/det_gt_5k_rec_train.txt'] [2021/02/24 12:12:10] root INFO: name : SimpleDataSet [2021/02/24 12:12:10] root INFO: transforms : [2021/02/24 12:12:10] root INFO: DecodeImage : [2021/02/24 12:12:10] root INFO: channel_first : False [2021/02/24 12:12:10] root INFO: img_mode : BGR [2021/02/24 12:12:10] root INFO: RecAug : None [2021/02/24 12:12:10] root INFO: CTCLabelEncode : None [2021/02/24 12:12:10] root INFO: RecResizeImg : [2021/02/24 12:12:10] root INFO: image_shape : [3, 70, 220] [2021/02/24 12:12:10] root INFO: KeepKeys : [2021/02/24 12:12:10] root INFO: keep_keys : ['image', 'label', 'length'] [2021/02/24 12:12:10] root INFO: loader : [2021/02/24 12:12:10] root INFO: batch_size_per_card : 16 [2021/02/24 12:12:10] root INFO: drop_last : True [2021/02/24 12:12:10] root INFO: num_workers : 8 [2021/02/24 12:12:10] root INFO: shuffle : True [2021/02/24 12:12:10] root INFO: train with paddle 2.0.0 and device CUDAPlace(0) [2021/02/24 12:12:10] root INFO: Initialize indexs of datasets:['D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits/det_gt_5k_rec_train.txt'] [2021/02/24 12:12:10] root INFO: Initialize indexs of datasets:['D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits/det_gt_5k_rec_val.txt'] [2021/02/24 12:12:14] root INFO: train from scratch [2021/02/24 12:12:14] root INFO: train dataloader has 281 iters, valid dataloader has 32 iters [2021/02/24 12:12:14] root INFO: During the training process, after the 100th iteration, an evaluation is run every 200 iterations [2021/02/24 12:12:14] root INFO: Initialize indexs of datasets:['D:/Plate_OCR/13/CCPD2019.tar/CCPD2019/splits/det_gt_5k_rec_train.txt'] Traceback (most recent call last): File "C:/Users/Administrator/Desktop/PaddleOCR-release-2.0/main_train.py", line 8, in ocr_train(conf_file_path, det_train) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\tools\main\ocr_train.py", line 48, in ocr_train train.main_new(args) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\tools\train.py", line 155, in main_new eval_class, pre_best_model_dict, logger, vdl_writer) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\tools\program.py", line 210, in train preds = model(images) File "D:\ProgramData\Anaconda3\envs\tf_1.15\lib\site-packages\paddle\fluid\dygraph\layers.py", line 891, in call outputs = self.forward(inputs, kwargs) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\ppocr\modeling\architectures\base_model.py", line 76, in forward x = self.neck(x) File "D:\ProgramData\Anaconda3\envs\tf_1.15\lib\site-packages\paddle\fluid\dygraph\layers.py", line 891, in call outputs = self.forward(*inputs, *kwargs) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\ppocr\modeling\necks\rnn.py", line 89, in forward x = self.encoder_reshape(x) File "D:\ProgramData\Anaconda3\envs\tf_1.15\lib\site-packages\paddle\fluid\dygraph\layers.py", line 891, in call outputs = self.forward(inputs, kwargs) File "C:\Users\Administrator\Desktop\PaddleOCR-release-2.0\ppocr\modeling\necks\rnn.py", line 31, in forward assert H == 1 AssertionError W0224 12:12:10.786311 14688 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.1, Runtime API Version: 10.2 W0224 12:12:10.798249 14688 device_context.cc:372] device: 0, cuDNN Version: 7.6.

bj5546 commented 3 years ago

我也遇到相同问题

littletomatodonkey commented 3 years ago

ctc decode的时候,输入需要是1为序列,因此降采样之后,建议特征图高度为1,ppocr中,特征图会降采样32倍,之后高度正好为1,所以有2种解决方案

  1. 指定输入shape高度为32(推荐)
  2. 在backbone的mv3中添加更多的降采样模块,保证输出的特征图高度为1
bj5546 commented 3 years ago

ctc decode的时候,输入需要是1为序列,因此降采样之后,建议特征图高度为1,ppocr中,特征图会降采样32倍,之后高度正好为1,所以有2种解决方案

  1. 指定输入shape高度为32(推荐)
  2. 在backbone的mv3中添加更多的降采样模块,保证输出的特征图高度为1

但是之前1.8的时候我可以设置成任意的尺寸的

bj5546 commented 3 years ago

@tink2123 @LDOUBLEV

paddle-bot-old[bot] commented 3 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。