PaddlePaddle / Paddle-Lite

PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)
https://www.paddlepaddle.org.cn/lite
Apache License 2.0
6.92k stars 1.61k forks source link

PaddleOCR量化训练或者直接降到fp16,精度很差 #10169

Closed Jverson closed 4 months ago

Jverson commented 1 year ago

PaddleOCR在进行推理速度优化时,尝试过将模型降到fp16,也尝试过量化训练,最后效果都很差,主要问题有两个:

  1. 将训练好的PaddleOCR dbnet和crnn模型,直接转为nb模型后,在RK3326的CPU上跑起来是没有问题的,但是转为FP16后,在RK3326的GPU上跑起来准确率直线下降,完全不能用。 PaddleOCR版本: dbnet: release/2.1, crnn: release/2.3 Paddle版本: paddlepaddle-gpu 2.1.2.post110 Paddle-Lite版本:RK提供的libpaddle_light_api_shared.so

1.1 转inference模型: python3 tools/export_model.py -c configs/det/ch_ppocr_v2.0/ch_det_230327.yml -o Global.pretrained_model=output/det_dbnet_230327/best_accuracy_XXXX Global.load_static_weights=False Global.save_inference_dir=output/det_dbnet_230327/best_accuracy_XXXX python3 tools/export_model.py -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation_finetune.yml -o Global.pretrained_model=/data1/huajie/OCR/limaopeng/output/rec_pp-OCRv2_distillation_32320_cosine_focal_230314/best_accuracy_0.861104941699056_299000 Global.load_static_weights=False Global.save_inference_dir=/data1/huajie/OCR/limaopeng/output/rec_pp-OCRv2_distillation_32320_cosine_focal_230314/best_accuracy_0.861104941699056_299000

1.2 转nb模型(opt文件也是由RK提供的): ./opt --model_file=./dbnet_XXXX/inference.pdmodel --param_file=./dbnet_XXXX/inference.pdiparams --optimize_out=./det_models/dbnet_XXXX --valid_targets=opencl --optimize_out_type=naive_buffer --enable_fp16=true ./opt --model_file=./crnn_XXX/Student/inference.pdmodel --param_file=./crnn_XXX/Student/inference.pdiparams --optimize_out=./rec_models/crnn_XXXX --valid_targets=opencl --optimize_out_type=naive_buffer --enable_fp16=true

  1. 使用PaddleSlim量化训练PaddleOCR crnn模型,训练时在验证集的准确率跟非量化训练相差不大,但是转为nb模型之后,推理结果直接就是空的。 PaddleOCR版本: release/2.3 Paddle版本: paddlepaddle-gpu 2.1.2.post110 PaddleSlim版本: paddleslim 2.1.1 Paddle-Lite版本:v2.9和v2.10都试过

2.1 训练命令: python deploy/slim/quantization/quant.py -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_quant.yml

2.2 转inference模型: python deploy/slim/quantization/export_model.py -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_quant.yml -o Global.checkpoints=outputs/rec/quant_20230105/best_accuracy_XXXX Global.load_static_weights=False Global.save_inference_dir=outputs/rec/quant_20230105_2/best_accuracy_XXXX

2.3 转nb模型: ./opt --model_file=./crnn_quant/best_accuracy_XXXX/inference.pdmodel --param_file=./crnn_quant/best_accuracy_XXXX/inference.pdiparams --optimize_out=./crnn_quant/best_accuracy_XXXX --valid_targets=arm --optimize_out_type=naive_buffer

ppocr_models.zip

mjp9527 commented 1 year ago

ARM CPU也支持FP16,转模型时--enable_fp16=ON 可以尝试一下,精度下降严重有可能是有bug,不一定是模型问题

liuyongfusteven commented 1 year ago

直接用官方量化的呢: ch_PP-OCRv3_rec_slim 我目前CPU 也正常运行,GPU 报BUS ERROR

mysteriousHerb commented 1 year ago

直接用官方量化的呢: ch_PP-OCRv3_rec_slim 我目前CPU 也正常运行,GPU 报BUS ERROR

请问一下官方量化的.nb 模型,为什么我在cpu上跑都直接是 40个0 confidence?

图像预处理有什么特别的吗?

def process_img(image_path, resize_shape: tuple[int] = (48, 320)):
    img = cv2.imread(image_path)
    # img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    resize_width = int(
        np.clip(np.ceil(img.shape[1] * resize_shape[0] / img.shape[0]), 0, resize_shape[1])
    )

    text_img = cv2.resize(img, (resize_width, resize_shape[0]))
    # # Scale image and convert to float32
    text_img = text_img.astype('float32')
    text_img = text_img.transpose((2, 0, 1)) / 255
    text_img -= 0.5
    text_img /= 0.5
    # Pad the remaining pixels with 0s
    resized_textimg = np.zeros((1, 3, resize_shape[0], resize_shape[1]), dtype=np.float32)
    resized_textimg[:, :, :, :resize_width] = text_img

    return resized_textimg