PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.72k stars 2.86k forks source link

文档信息抽取uie-x-base推理速度和模型加载问题咨询 #8684

Open AllenMeng2009 opened 4 days ago

AllenMeng2009 commented 4 days ago

请提出你的问题

我现在在2张nvidia A800 80G显存上用uie-x-base底座模型进行医院检测报告场景信息抽取推理,发现每次输入一张报告就要加载一次微调后的模型[2024-06-29 07:45:28,517] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load './exportdata_finetune_evaluate/checkpoint_uiexbase/model_best' 单模型加载就耗时:加载模型 6.760296106338501s(如果多张报告推理能否一次加载模型多次使用呢?有优化办法吗?)。 另外,我用ie = Taskflow("information_extraction", schema=schema, model="uie-x-base",task_path='./exportdata_finetune_evaluate/checkpoint_uiexbase/model_best', batch_size=64, precision='fp16', use_fast=True) 推理一张项目较多的报告大概耗了10多秒,如下: 信息提取 10.658369064331055s {"就诊卡号": "0003570020", "住院号": "0969150", "丙氨酸氨基转移酶": "37.9", "碱性磷酸酶": "75", "L-γ-谷氨酰转肽酶": "49.8", "乳酸脱氢酶": "225", "肌酸激酶": "46", "α-羟丁酸脱氢酶": "146", "总胆红素": "7", "直接胆红素": "2.9", "间接胆红素": "4.8", "总胆固醇": "5.24", "甘油三脂": "1.12", "高密度脂蛋白胆固醇": "1.45", "低密度脂蛋白胆固醇": "3.29", "脂蛋白(a)": "160", "总蛋白": "71.9", "白蛋白": "41.7", "球蛋白": "30.2", "白球比": "1. 4", "葡萄糖": "4.97", "尿素": "5.02", "肌酐": "55.1", "尿酸": "323", "胱抑素C": "1.01", "钙": "2.09", "磷": "0.98", "钾": "2.09", "钠": "136.7", "氯": "107.1", "视黄醇结合蛋白": "31.8", "腺苷脱氨酶": "20.0", "谷胱甘肽还原酶": "72", "估算的肾小球滤过率": "78"} 目前已配置了fp16(生效),use-fast=True(但是对uie-x-base无效),请问还有其他办法从模型加载时间和推理时间上提优吗?另外,我尝试了多种办法都是失败告终(1、封闭域模型蒸馏2、模型量化和压缩等都不行),请专家协助提升性能,多谢!

AllenMeng2009 commented 4 days ago

当前的硬件配置应该很高了,但是模型加载时间和推理时间一直不理想,上线部署计划模型加载和推理控制在3-5s可接受,希望有好的办法解决,在线等,急!谢谢!

AllenMeng2009 commented 4 days ago

![Uploading 202405.jpg…]()