Closed scyangjunjie closed 6 months ago
训练图片只有30多张,而且字符内容单一 使用的是en_PP-OCRv3_rec_train预训练模型
我重新训练了自己的det模型,使用自己的det 和 REC进行测试,发现结果也是乱码,很奇怪,转换模型前rec模型测试都是正常的,det模型也正常;我还测试过CPU和GPU进行推理,都是一样的结果
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, rec_model_dir='E:/paddleocr/PaddleOCR-release-2.6/inference_model/rec',det_model_dir='E:\paddleocr\PaddleOCR-release-2.6\inference_model\det\Student2') img_path = 'image_4.png' result = ocr.ocr(img_path, cls=True) for idx in range(len(result)): res = result[idx] for line in res: print(line)
结果: [2023/10/09 08:48:50] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=True, use_xpu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, shape_info_filename=None, precision='fp32', gpu_mem=500, image_dir=None, det_algorithm='DB', det_model_dir='E:\paddleocr\PaddleOCR-release-2.6\inference_model\det\Student2', det_limit_side_len=960, det_limit_type='max', det_db_thresh=0.3, det_db_box_thresh=0.6, det_db_unclip_ratio=1.5, max_batch_size=10, use_dilation=False, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_sast_polygon=False, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_box_type='quad', det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, det_fce_box_type='poly', rec_algorithm='SVTR_LCNet', rec_model_dir='E:/paddleocr/PaddleOCR-release-2.6/inference_model/rec', rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='C:\ProgramData\anaconda3\envs\paddleocr\lib\site-packages\paddleocr\ppocr\utils\ppocr_keys_v1.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=True, cls_model_dir='C:\Users\Administrator/.paddleocr/whl\cls\ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, save_pdf=False, lang='ch', det=True, rec=True, type='ocr', ocr_version='PP-OCRv3', structure_version='PP-Structurev2') [2023/10/09 08:48:58] ppocr DEBUG: dt_boxes num : 3, elapse : 2.245330810546875 [2023/10/09 08:48:58] ppocr DEBUG: cls num : 3, elapse : 0.040241241455078125 [2023/10/09 08:48:58] ppocr DEBUG: rec_res num : 3, elapse : 0.053820133209228516 [[361.0, 2.0], [617.0, 6.0], [616.0, 62.0], [360.0, 62.0]] ('8鲶鲶题韵韵', 0.7763018012046814) [[26.0, 12.0], [99.0, 12.0], [99.0, 62.0], [26.0, 62.0]] ('骑', 0.9968141913414001) [[110.0, 6.0], [235.0, 13.0], [231.0, 62.0], [106.0, 62.0]] ('骑:>鲶', 0.9953407645225525)
我再贴一个我这里用到的rec训练配置文件en_PP-OCRv3_rec.yml,大佬有机会看看呗
Global: debug: false use_gpu: true epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/v3_en_mobile save_epoch_step: 3 eval_batch_step: [0, 2000] cal_metric_during_train: true pretrained_model: .\pretrain_models\en_PP-OCRv3_rec_train\best_accuracy.pdparams checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/en_dict.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3_en.txt
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05
Architecture: model_type: rec algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list:
Loss: name: MultiLoss loss_config_list:
PostProcess:
name: CTCLabelDecode
Metric: name: RecMetric main_indicator: acc ignore_space: False
Train: dataset: name: SimpleDataSet data_dir: ./train_data/ ext_op_transform_idx: 1 label_file_list:
如果你转换模型前的识别效果是ok的,那你应该在推理的时候,在ocr = PaddleOCR(use_angle_cls=True, rec_model_dir='E:/paddleocr/PaddleOCR-release-2.6/inference_model/rec',det_model_dir='E:\paddleocr\PaddleOCR-release-2.6\inference_model\det\Student2')这句话中,指定你的字典文件,试试
我也遇到过,导出之前好好的,导出来模型就不行,解决方法就是换模型,换检测和识别模型,多试几个模型,还有就是增加数据集数量
(test) PS D:\paddle-ocr\PaddleOCR-release-2.6> python tools/infer_rec.py -c configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml -o Global.pretrained_model=output/v3_en_mobile/best_accuracy.pdparams Global.infer_img=D:\paddle-ocr\PaddleOCR-release-2.6\train_data\data\image_0.png C:\ProgramData\anaconda3\envs\test\lib\site-packages\setuptools\sandbox.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources C:\ProgramData\anaconda3\envs\test\lib\site-packages\pkg_resources__init__.py:2871: DeprecationWarning: Deprecated call to
pkg_resources.declare_namespace('google')
. Implementing implicit namespace packages (as specified in PEP 420) is preferred topkg_resources.declare_namespace
. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) [2023/09/25 16:24:19] ppocr INFO: Architecture : [2023/09/25 16:24:19] ppocr INFO: Backbone : [2023/09/25 16:24:19] ppocr INFO: last_conv_stride : [1, 2] [2023/09/25 16:24:19] ppocr INFO: last_pool_type : avg [2023/09/25 16:24:19] ppocr INFO: name : MobileNetV1Enhance [2023/09/25 16:24:19] ppocr INFO: scale : 0.5 [2023/09/25 16:24:19] ppocr INFO: Head : [2023/09/25 16:24:19] ppocr INFO: head_list : [2023/09/25 16:24:19] ppocr INFO: CTCHead : [2023/09/25 16:24:19] ppocr INFO: Head : [2023/09/25 16:24:19] ppocr INFO: fc_decay : 1e-05 [2023/09/25 16:24:19] ppocr INFO: Neck : [2023/09/25 16:24:19] ppocr INFO: depth : 2 [2023/09/25 16:24:19] ppocr INFO: dims : 64 [2023/09/25 16:24:19] ppocr INFO: hidden_dims : 120 [2023/09/25 16:24:19] ppocr INFO: name : svtr [2023/09/25 16:24:19] ppocr INFO: use_guide : True [2023/09/25 16:24:19] ppocr INFO: SARHead : [2023/09/25 16:24:19] ppocr INFO: enc_dim : 512 [2023/09/25 16:24:19] ppocr INFO: max_text_length : 25 [2023/09/25 16:24:19] ppocr INFO: name : MultiHead [2023/09/25 16:24:19] ppocr INFO: Transform : None [2023/09/25 16:24:19] ppocr INFO: algorithm : SVTR [2023/09/25 16:24:19] ppocr INFO: model_type : rec [2023/09/25 16:24:19] ppocr INFO: Eval : [2023/09/25 16:24:19] ppocr INFO: dataset : [2023/09/25 16:24:19] ppocr INFO: data_dir : ./train_data [2023/09/25 16:24:19] ppocr INFO: label_file_list : ['./train_data/rec/val.txt'] [2023/09/25 16:24:19] ppocr INFO: name : SimpleDataSet [2023/09/25 16:24:19] ppocr INFO: transforms : [2023/09/25 16:24:19] ppocr INFO: DecodeImage : [2023/09/25 16:24:19] ppocr INFO: channel_first : False [2023/09/25 16:24:19] ppocr INFO: img_mode : BGR [2023/09/25 16:24:19] ppocr INFO: MultiLabelEncode : None [2023/09/25 16:24:19] ppocr INFO: RecResizeImg : [2023/09/25 16:24:19] ppocr INFO: image_shape : [3, 48, 320] [2023/09/25 16:24:19] ppocr INFO: KeepKeys : [2023/09/25 16:24:19] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2023/09/25 16:24:19] ppocr INFO: loader : [2023/09/25 16:24:19] ppocr INFO: batch_size_per_card : 1 [2023/09/25 16:24:19] ppocr INFO: drop_last : False [2023/09/25 16:24:19] ppocr INFO: num_workers : 4 [2023/09/25 16:24:19] ppocr INFO: shuffle : False [2023/09/25 16:24:19] ppocr INFO: Global : [2023/09/25 16:24:19] ppocr INFO: cal_metric_during_train : True [2023/09/25 16:24:19] ppocr INFO: character_dict_path : ppocr/utils/en_dict.txt [2023/09/25 16:24:19] ppocr INFO: checkpoints : None [2023/09/25 16:24:19] ppocr INFO: debug : False [2023/09/25 16:24:19] ppocr INFO: distributed : False [2023/09/25 16:24:19] ppocr INFO: epoch_num : 500 [2023/09/25 16:24:19] ppocr INFO: eval_batch_step : [0, 2000] [2023/09/25 16:24:19] ppocr INFO: infer_img : D:\paddle-ocr\PaddleOCR-release-2.6\train_data\data\image_0.png [2023/09/25 16:24:19] ppocr INFO: infer_mode : False [2023/09/25 16:24:19] ppocr INFO: log_smooth_window : 20 [2023/09/25 16:24:19] ppocr INFO: max_text_length : 25 [2023/09/25 16:24:19] ppocr INFO: pretrained_model : output/v3_en_mobile/best_accuracy.pdparams [2023/09/25 16:24:19] ppocr INFO: print_batch_step : 10 [2023/09/25 16:24:19] ppocr INFO: save_epoch_step : 3 [2023/09/25 16:24:19] ppocr INFO: save_inference_dir : None [2023/09/25 16:24:19] ppocr INFO: save_model_dir : ./output/v3_en_mobile [2023/09/25 16:24:19] ppocr INFO: save_res_path : ./output/rec/predicts_ppocrv3_en.txt [2023/09/25 16:24:19] ppocr INFO: use_gpu : True [2023/09/25 16:24:19] ppocr INFO: use_space_char : True [2023/09/25 16:24:19] ppocr INFO: use_visualdl : False [2023/09/25 16:24:19] ppocr INFO: Loss : [2023/09/25 16:24:19] ppocr INFO: loss_config_list : [2023/09/25 16:24:19] ppocr INFO: CTCLoss : None [2023/09/25 16:24:19] ppocr INFO: SARLoss : None [2023/09/25 16:24:19] ppocr INFO: name : MultiLoss [2023/09/25 16:24:19] ppocr INFO: Metric : [2023/09/25 16:24:19] ppocr INFO: ignore_space : False [2023/09/25 16:24:19] ppocr INFO: main_indicator : acc [2023/09/25 16:24:19] ppocr INFO: name : RecMetric [2023/09/25 16:24:19] ppocr INFO: Optimizer : [2023/09/25 16:24:19] ppocr INFO: beta1 : 0.9 [2023/09/25 16:24:19] ppocr INFO: beta2 : 0.999 [2023/09/25 16:24:19] ppocr INFO: lr : [2023/09/25 16:24:19] ppocr INFO: learning_rate : 0.001 [2023/09/25 16:24:19] ppocr INFO: name : Cosine [2023/09/25 16:24:19] ppocr INFO: warmup_epoch : 5 [2023/09/25 16:24:19] ppocr INFO: name : Adam [2023/09/25 16:24:19] ppocr INFO: regularizer : [2023/09/25 16:24:19] ppocr INFO: factor : 3e-05 [2023/09/25 16:24:19] ppocr INFO: name : L2 [2023/09/25 16:24:19] ppocr INFO: PostProcess : [2023/09/25 16:24:19] ppocr INFO: name : CTCLabelDecode [2023/09/25 16:24:19] ppocr INFO: Train : [2023/09/25 16:24:19] ppocr INFO: dataset : [2023/09/25 16:24:19] ppocr INFO: data_dir : ./train_data/ [2023/09/25 16:24:19] ppocr INFO: ext_op_transform_idx : 1 [2023/09/25 16:24:19] ppocr INFO: label_file_list : ['./train_data/rec/train.txt'] [2023/09/25 16:24:19] ppocr INFO: name : SimpleDataSet [2023/09/25 16:24:19] ppocr INFO: transforms : [2023/09/25 16:24:19] ppocr INFO: DecodeImage : [2023/09/25 16:24:19] ppocr INFO: channel_first : False [2023/09/25 16:24:19] ppocr INFO: img_mode : BGR [2023/09/25 16:24:19] ppocr INFO: RecConAug : [2023/09/25 16:24:19] ppocr INFO: ext_data_num : 2 [2023/09/25 16:24:19] ppocr INFO: image_shape : [48, 320, 3] [2023/09/25 16:24:19] ppocr INFO: max_text_length : 25 [2023/09/25 16:24:19] ppocr INFO: prob : 0.5 [2023/09/25 16:24:19] ppocr INFO: RecAug : None [2023/09/25 16:24:19] ppocr INFO: MultiLabelEncode : None [2023/09/25 16:24:19] ppocr INFO: RecResizeImg : [2023/09/25 16:24:19] ppocr INFO: image_shape : [3, 48, 320] [2023/09/25 16:24:19] ppocr INFO: KeepKeys : [2023/09/25 16:24:19] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2023/09/25 16:24:19] ppocr INFO: loader : [2023/09/25 16:24:19] ppocr INFO: batch_size_per_card : 8 [2023/09/25 16:24:19] ppocr INFO: drop_last : True [2023/09/25 16:24:19] ppocr INFO: num_workers : 4 [2023/09/25 16:24:19] ppocr INFO: shuffle : True [2023/09/25 16:24:19] ppocr INFO: profiler_options : None [2023/09/25 16:24:19] ppocr INFO: train with paddle 2.4.0 and device Place(gpu:0) W0925 16:24:19.603299 21572 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.2 W0925 16:24:19.606289 21572 gpu_resources.cc:91] device: 0, cuDNN Version: 8.9. [2023/09/25 16:24:21] ppocr INFO: load pretrain successful from output/v3_en_mobile/best_accuracy [2023/09/25 16:24:21] ppocr INFO: infer_img: D:\paddle-ocr\PaddleOCR-release-2.6\train_data\data\image_0.png [2023/09/25 16:24:22] ppocr INFO: result: SIKSONOPU6mm 0.9228772521018982 [2023/09/25 16:24:22] ppocr INFO: success!以上是使用 tools/infer_rec.py来验证模型,结果result: SIKSONOPU6mm正确
然后我进行了模型转换
(test) PS D:\paddle-ocr\PaddleOCR-release-2.6> python tools/export_model.py -c "./configs/rec/PP-OCRv3/en_PP-OCRv3_rec.yml" -o Global.pretrained_model="./output/v3_en_mobile/best_accuracy.pdparams" Global.save_inference_dir="./inference_model/rec/" C:\ProgramData\anaconda3\envs\test\lib\site-packages\setuptools\sandbox.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources C:\ProgramData\anaconda3\envs\test\lib\site-packages\pkg_resources__init__.py:2871: DeprecationWarning: Deprecated call to
pkg_resources.declare_namespace('google')
. Implementing implicit namespace packages (as specified in PEP 420) is preferred topkg_resources.declare_namespace
. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) W0925 16:25:46.523634 10848 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.2, Runtime API Version: 11.2 W0925 16:25:46.525628 10848 gpu_resources.cc:91] device: 0, cuDNN Version: 8.9. [2023/09/25 16:25:47] ppocr INFO: load pretrain successful from ./output/v3_en_mobile/best_accuracy [2023/09/25 16:25:48] ppocr INFO: inference model is saved to ./inference_model/rec/inference (test) PS D:\paddle-ocr\PaddleOCR-release-2.6>然后我使用转换后的模型进行识别验证同样的图片(由于没有训练det模型,我使用的是现有det模型->en_PP-OCRv3_det_infer)
(test) PS D:\paddle-ocr\PaddleOCR-release-2.6> python tools/infer/predict_system.py --image_dir="D:\paddle-ocr\PaddleOCR-release-2.6\train_data\rec\test\image_0_crop_2.jpg" --det_model_dir="D:\paddle-ocr\en_PP-OCRv3_det_infer" --rec_model_dir="D:\paddle-ocr\PaddleOCR-release-2.6\inference_model\rec\" C:\ProgramData\anaconda3\envs\test\lib\site-packages\setuptools\sandbox.py:13: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html import pkg_resources C:\ProgramData\anaconda3\envs\test\lib\site-packages\pkg_resources__init__.py:2871: DeprecationWarning: Deprecated call to
main(args)
File "D:\paddle-ocr\PaddleOCR-release-2.6\tools\infer\predict_system.py", line 223, in main
draw_img = draw_ocr_box_txt(
File "D:\paddle-ocr\PaddleOCR-release-2.6\tools\infer\utility.py", line 433, in draw_ocr_box_txt
img_right_text = draw_box_txt_fine((w, h), box, txt, font_path)
File "D:\paddle-ocr\PaddleOCR-release-2.6\tools\infer\utility.py", line 461, in draw_box_txt_fine
font = create_font(txt, (box_width, box_height), font_path)
File "D:\paddle-ocr\PaddleOCR-release-2.6\tools\infer\utility.py", line 483, in create_font
length = font.getsize(txt)[0]
AttributeError: 'FreeTypeFont' object has no attribute 'getsize'
pkg_resources.declare_namespace('google')
. Implementing implicit namespace packages (as specified in PEP 420) is preferred topkg_resources.declare_namespace
. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages declare_namespace(pkg) [2023/09/25 16:28:36] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320 [2023/09/25 16:28:37] ppocr DEBUG: dt_boxes num : 1, elapsed : 0.9437923431396484 [2023/09/25 16:28:37] ppocr DEBUG: rec_res num : 1, elapsed : 0.008970022201538086 [2023/09/25 16:28:37] ppocr DEBUG: 0 Predict time of D:\paddle-ocr\PaddleOCR-release-2.6\train_data\rec\test\image_0_crop_2.jpg: 0.953s [2023/09/25 16:28:37] ppocr DEBUG: 0-骑:>, 0.919 Traceback (most recent call last): File "D:\paddle-ocr\PaddleOCR-release-2.6\tools\infer\predict_system.py", line 272, in输出结果是乱码的,请大佬分析下
我的rec模型训练参数如下:
Global: debug: false use_gpu: true epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/v3_en_mobile save_epoch_step: 3 eval_batch_step: [0, 2000] cal_metric_during_train: true pretrained_model: ./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: ppocr/utils/en_dict.txt max_text_length: &max_text_length 25 infer_mode: false use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3_en.txt
Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05
Architecture: model_type: rec algorithm: SVTR Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg Head: name: MultiHead head_list:
Loss: name: MultiLoss loss_config_list:
PostProcess:
name: CTCLabelDecode
Metric: name: RecMetric main_indicator: acc ignore_space: False
Train: dataset: name: SimpleDataSet data_dir: ./train_data/ ext_op_transform_idx: 1 label_file_list: