PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.19k stars 2.95k forks source link

[Question]: uie-x 循环推理显存增长问题 #7560

Closed Yian320 closed 4 months ago

Yian320 commented 1 year ago

请提出你的问题

使用paddlenlp uiex模型进行图片信息抽取,将一批图片循环批量进行预测抽取过程中,出现显存增加问题怎么解决,同一批图片单独进行OCR通用识别时不会出现显存增长问题

zhh8689 commented 11 months ago

请问你有解决吗?我也遇到了一样的问题

Yian320 commented 10 months ago

请问你有解决吗?我也遇到了一样的问题

你好,未解决,你有解决办法吗

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

w5688414 commented 7 months ago

请问能提供一下最小复现代码吗?另外推理部署推荐使用fastdeploy https://github.com/PaddlePaddle/FastDeploy/blob/develop/examples/text/uie/README_CN.md?plain=1

Yian320 commented 7 months ago

class UIE_X(STD): def init(self, logger=None, config=None): super().init(logger=logger, config=config) if self.config.use_gpu is False: self.device = 'cpu' else: self.device = 'cuda' os.environ["CUDA_VISIBLE_DEVICES"] = config.gpu_id model_uie_x = self.config.model_uiex self.image_info_extractor = Taskflow("information_extraction", task_path=model_uie_x, device_id=int(self.config.gpu_id))

def uie_x_predict(self, image, certificate_type):
    """
    证书图片抽取模型推理
    :param image: 图片
    :param certificate_type: 图片类型
    :return:
    """
    try:
        if str(certificate_type).isdigit():
            schema_num = certificate_type
        else:
            schema_num = self.config.certifi_zh_map.get(certificate_type, "")
        schema_str = f"schema_{schema_num}"
        schema = self.config.schema.get(schema_str, [])
        # print(schema)
        self.image_info_extractor.set_schema(schema)
        # print(image)
        result = self.image_info_extractor({"doc": image})
        # print("result-uie-x", result)
        self.logger.info("图片信息抽取完成,正在返回结果...")
        data = self.extract_data_uie_x_predict_result(schema, result)
        return data
    except Exception as e:
        self.logger.error(f'error file:{e.__traceback__.tb_frame.f_globals["__file__"]} line:{e.__traceback__.tb_lineno} {e}')
        print(f'error file:{e.__traceback__.tb_frame.f_globals["__file__"]} line:{e.__traceback__.tb_lineno} {e}')
        return {}

@w5688414 使用ocr V4版本,看到有说是应为paddleocr 是显存贪婪机制,上述推理代码在处理多组图片时会出现显存增长,目前使用flask部署推理的,测试中显存最大增长到17G

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。