RapidAI / RapidOCR

📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO and PaddlePaddle.
https://rapidai.github.io/RapidOCRDocs
Apache License 2.0
3.11k stars 370 forks source link

内存泄漏 #187

Closed cmathx closed 5 months ago

cmathx commented 5 months ago

问题描述 / Problem Description

采用cpu进行ocr预测,都会内存泄漏 部署ocr服务后,线上有各种不同图像进行ocr识别

运行环境 / Runtime Environment

cpu四核,debian

复现代码 / Reproduction Code

######server.py######
from paddle_ocr_client import paddle_ocr_server
@server.register('predict_images')
def predict_images(ctx, req):
    resp = paddle_ocr_server.predict_images(req)
    return resp

######paddle_ocr_client.py######
class PaddleOcrServer():
    def __init__(self) -> None:
        #self.ocr = PaddleOCR(enable_mkldnn=True, cpu_threads=2, use_space_char=True, lang="en", warmup=True, ir_optim=True, rec_batch_num=8, det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', rec_char_dict_path='./inference/en_dict.txt', use_onnx=True)
        #self.ocr = PaddleOCR(use_space_char=True, lang="en", det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', rec_char_dict_path='./inference/en_dict.txt', use_onnx=True)
        #self.ocr = PaddleOCR(use_gpu=True, ocr_version='PP-OCRv3', use_space_char=True, lang="en", warmup=True, enable_mkldnn=True, ir_optim=True, cpu_threads=2, rec_batch_num=8, det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', use_onnx=True)
        self.ocr = RapidOCR(config_path='inference/config.yaml')
        self.ocr_result = OcrResult()
        self.image_ocr_result = ImageOcrResult()
        for i in range(20):
            self.warmup_test()

    def predict_images(self, req):
        result_ll = []
        beg_time = time.time()
        for image_info in req.images:
            result = self.ocr(image_info.data)[0]
            #print(result)
        end_time = time.time()
        print(1000.0 * (end_time - beg_time))
        #logging.info('paddle_ocr cost: %s\n' %(str(100.0*(end_time-beg_time))))
        #cost = 1000.0 * (end_time - beg_time)
        return self.ocr_result
    def warmup_test():
        ###

######inference/config.yaml######
Global:
    text_score: 0.5
    use_det: true
    use_cls: false
    use_rec: true
    print_verbose: false
    min_height: 30
    width_height_ratio: 8

    intra_op_num_threads: 4
    inter_op_num_threads: 4

Det:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/det_slim_onnx/model.onnx

    limit_side_len: 576
    limit_type: min

    thresh: 0.3
    box_thresh: 0.5
    max_candidates: 1000
    unclip_ratio: 1.6
    use_dilation: true
    score_mode: fast

Cls:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/cls_onnx/model.onnx

    cls_image_shape: [3, 48, 192]
    cls_batch_num: 6
    cls_thresh: 0.9
    label_list: ['0', '180']

Rec:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/rec_slim_onnx/model.onnx
    rec_keys_path: inference/en_dict.txt

    rec_img_shape: [3, 48, 320]
    rec_batch_num: 6

可能解决方案 / Possible solutions

paddleocr内部

SWHL commented 5 months ago

你是不是提错地方了啊,应该是PaddleOCR项目吧

cmathx commented 5 months ago

请教下,paddleocr采用cpu预测,内存泄漏的问题是否还没fix。这个开源项目,速度优化的还不错,尤其是官方的模型,但是内存bug貌似一直未修复?