内存泄漏 - Githubissues

cmathx commented 5 months ago

问题描述 / Problem Description

采用cpu进行ocr预测，都会内存泄漏部署ocr服务后，线上有各种不同图像进行ocr识别

paddle原生模型，内存泄漏，大约半小时后内存超过3.5GB；
转openvino版本，内存泄漏，大约半小时后内存超过3.5GB；
转onnx版本，内存泄漏，大约6小时后，内存超过2.3GB；

运行环境 / Runtime Environment

cpu四核，debian

复现代码 / Reproduction Code

######server.py######
from paddle_ocr_client import paddle_ocr_server
@server.register('predict_images')
def predict_images(ctx, req):
    resp = paddle_ocr_server.predict_images(req)
    return resp

######paddle_ocr_client.py######
class PaddleOcrServer():
    def __init__(self) -> None:
        #self.ocr = PaddleOCR(enable_mkldnn=True, cpu_threads=2, use_space_char=True, lang="en", warmup=True, ir_optim=True, rec_batch_num=8, det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', rec_char_dict_path='./inference/en_dict.txt', use_onnx=True)
        #self.ocr = PaddleOCR(use_space_char=True, lang="en", det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', rec_char_dict_path='./inference/en_dict.txt', use_onnx=True)
        #self.ocr = PaddleOCR(use_gpu=True, ocr_version='PP-OCRv3', use_space_char=True, lang="en", warmup=True, enable_mkldnn=True, ir_optim=True, cpu_threads=2, rec_batch_num=8, det_db_thresh=0.5, det_db_score_mode='fast', det_limit_side_len=864, det_model_dir='./inference/det_onnx/model.onnx', rec_model_dir='./inference/rec_onnx/model.onnx', use_onnx=True)
        self.ocr = RapidOCR(config_path='inference/config.yaml')
        self.ocr_result = OcrResult()
        self.image_ocr_result = ImageOcrResult()
        for i in range(20):
            self.warmup_test()

    def predict_images(self, req):
        result_ll = []
        beg_time = time.time()
        for image_info in req.images:
            result = self.ocr(image_info.data)[0]
            #print(result)
        end_time = time.time()
        print(1000.0 * (end_time - beg_time))
        #logging.info('paddle_ocr cost: %s\n' %(str(100.0*(end_time-beg_time))))
        #cost = 1000.0 * (end_time - beg_time)
        return self.ocr_result
    def warmup_test():
        ###

######inference/config.yaml######
Global:
    text_score: 0.5
    use_det: true
    use_cls: false
    use_rec: true
    print_verbose: false
    min_height: 30
    width_height_ratio: 8

    intra_op_num_threads: 4
    inter_op_num_threads: 4

Det:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/det_slim_onnx/model.onnx

    limit_side_len: 576
    limit_type: min

    thresh: 0.3
    box_thresh: 0.5
    max_candidates: 1000
    unclip_ratio: 1.6
    use_dilation: true
    score_mode: fast

Cls:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/cls_onnx/model.onnx

    cls_image_shape: [3, 48, 192]
    cls_batch_num: 6
    cls_thresh: 0.9
    label_list: ['0', '180']

Rec:
    intra_op_num_threads: 4
    inter_op_num_threads: 4

    use_cuda: false
    use_dml: false

    model_path: inference/rec_slim_onnx/model.onnx
    rec_keys_path: inference/en_dict.txt

    rec_img_shape: [3, 48, 320]
    rec_batch_num: 6

可能解决方案 / Possible solutions

paddleocr内部

SWHL commented 5 months ago

你是不是提错地方了啊，应该是PaddleOCR项目吧

cmathx commented 5 months ago

请教下，paddleocr采用cpu预测，内存泄漏的问题是否还没fix。这个开源项目，速度优化的还不错，尤其是官方的模型，但是内存bug貌似一直未修复？

RapidAI / RapidOCR

内存泄漏 #187

问题描述 / Problem Description

运行环境 / Runtime Environment

复现代码 / Reproduction Code

可能解决方案 / Possible solutions