PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.71k stars 2.86k forks source link

想在华为昇腾NPU 910B4上用k8s环境部署paddlenlp-uie推理,目前paddlepaddle==2.5.2/paddlenlp==2.6.1/paddleocr==2.6.1.3发现推理很慢,怎么回事,求指导? #8606

Open AllenMeng2009 opened 2 weeks ago

AllenMeng2009 commented 2 weeks ago

请提出你的问题

如题

AllenMeng2009 commented 2 weeks ago

npu k8s部署文件如下: apiVersion: v1 kind: Pod metadata: name: hwei-ocr # 此处为示例,正在运行的任务都不允许重名 namespace: hwei spec: schedulerName: volcano nodeSelector: accelerator/huawei-npu: ascend-1980 containers:

AllenMeng2009 commented 2 weeks ago

dockerfile如下: FROM swr.cn-east-3.myhuaweicloud.com/atelier/pytorch_2_1_ascend:pytorch_2.1.0-cann_7.0.1.1-py_3.9-euler_2.10.7-aarch64-snt9b-20240411153110-ca68771

FROM registry.baidubce.com/device/paddle-npu:cann80T2-910B-ubuntu18-aarch64

FROM python:3.9.10

pip install --disable-pip-version-check --no-cache-dir -i https://mirrors.aliyun.com/pypi/#simple paddlepaddle==2.5.2; \

RUN pip install --disable-pip-version-check --no-cache-dir -i https://mirrors.aliyun.com/pypi/simple paddlepaddle==2.5.2; \ pip install --disable-pip-version-check --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple paddlenlp==2.6.1; \ pip install --disable-pip-version-check --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple paddleocr==2.6.1.3

复制代码到工作目录

COPY . /usr/src/app/

WORKDIR /usr/src/app

设置容器启动时执行的命令

CMD ["python", "/usr/src/app/medical_report_ocr.py"]

AllenMeng2009 commented 2 weeks ago

(base) PS C:\Users\12133> kubectl get pod -n hwei NAME READY STATUS RESTARTS AGE hwei-ocr 0/1 Completed 0 3h26m

(base) PS C:\Users\12133> kubectl describe pod hwei-ocr -n hwei Name: hwei-ocr Namespace: hwei Priority: 0 Service Account: default Node: 192.168.3.243/192.168.3.243 Start Time: Fri, 14 Jun 2024 12:50:06 +0800 Labels: Annotations: cce.kubectl.kubernetes.io/ascend-1980-configuration: {"pod_name":"hwei-ocr","server_id":"192.168.3.243","devices":[{"device_id":"1","device_ip":"29.61.179.253"}]} kubernetes.io/psp: psp-global scheduling.cce.io/gpu-topology-placement: huawei.com/ascend-1980=0x02 scheduling.k8s.io/group-name: podgroup-d3573a51-b104-4463-bc4d-ff5a5c50abaa Status: Succeeded IP: 10.0.2.118 IPs: IP: 10.0.2.118 Containers: train: Container ID: docker://5d85e5d1c4d86d137a174532c74dc626c76fd2b7458ec5d98e93770429420c85 Image: swr.cn-east-3.myhuaweicloud.com/hwei/hwei-ocr-recognition:2297af78 Image ID: docker-pullable://swr.cn-east-3.myhuaweicloud.com/hwei/hwei-ocr-recognition@sha256:d755d246df4dd4e0c3bc20e96c52098d7c897e11b8960284d24d040cdbe7ac11 Port: Host Port: Command: python Args: medical_report_ocr.py State: Terminated Reason: Completed Exit Code: 0 Started: Fri, 14 Jun 2024 12:50:21 +0800 Finished: Fri, 14 Jun 2024 14:29:52 +0800 Ready: False Restart Count: 0 Limits: cpu: 4 huawei.com/ascend-1980: 1 memory: 32G Requests: cpu: 2 huawei.com/ascend-1980: 1 memory: 16G Environment: NCCL_ASYNC_ERROR_HANDLING: 1 Mounts: /dev/shm from cache-volume (rw) /etc/hccn.conf from hccn (rw) /etc/localtime from localtime (rw) /hwei-data from data-volume (rw) /usr/local/Ascend/add-ons from ascend-add-ons (rw) /usr/local/Ascend/driver from ascend-driver (rw) /usr/local/bin/npu-smi from npu-smi (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-94w29 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: cache-volume: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: 3000Mi data-volume: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-obs-hwei ReadOnly: false ascend-driver: Type: HostPath (bare host directory volume) Path: /usr/local/Ascend/driver HostPathType: ascend-add-ons: Type: HostPath (bare host directory volume) Path: /usr/local/Ascend/add-ons HostPathType: hccn: Type: HostPath (bare host directory volume) Path: /etc/hccn.conf HostPathType: npu-smi: Type: HostPath (bare host directory volume) Path: /usr/local/bin/npu-smi HostPathType: localtime: Type: HostPath (bare host directory volume) Path: /etc/localtime HostPathType: kube-api-access-94w29: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: Burstable Node-Selectors: accelerator/huawei-npu=ascend-1980 Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

(base) PS C:\Users\12133> kubectl logs -f hwei-ocr -n hwei /home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2024-06-14 12:50:30,091] [ INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base_v1.1/model_state.pdparams 2.6.1.post 2.6.1.3 schema ['样本号', '姓名', '性别', '年龄', '就诊卡号', '住院号', '样本类型', '科室', '病区', '床号', '执行科室', '凝血酶原时间', 'PT国际化标准化比值', '活化部分凝血活酶时间', '纤维蛋白原', '凝血酶时间', 'D-二聚体', '申请医师', '检验者', '审核者', '采集时间', '接收时间', '报告时间', '检验门诊信息'] 检测 7.152557373046875e-07 100%|██████████| 1.04G/1.04G [01:28<00:00, 12.6MB/s] [2024-06-14 12:52:02,613] [ INFO] - Downloading config.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/config.json 100%|██████████| 451/451 [00:00<00:00, 1.05MB/s] [2024-06-14 12:52:02,832] [ INFO] - Downloading vocab.txt from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/vocab.txt 100%|██████████| 2.70M/2.70M [00:01<00:00, 2.14MB/s] [2024-06-14 12:52:04,386] [ INFO] - Downloading special_tokens_map.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/special_tokens_map.json 100%|██████████| 112/112 [00:00<00:00, 365kB/s] [2024-06-14 12:52:04,587] [ INFO] - Downloading tokenizer_config.json from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/tokenizer_config.json 100%|██████████| 195/195 [00:00<00:00, 517kB/s] [2024-06-14 12:52:04,788] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/taskflow/information_extraction/uie_m_base/sentencepiece.bpe.model 100%|██████████| 4.83M/4.83M [00:00<00:00, 16.6MB/s] [2024-06-14 12:52:05,340] [ INFO] - Loading configuration file /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/config.json [2024-06-14 12:52:05,341] [ INFO] - Loading weights file /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/model_state.pdparams [2024-06-14 12:52:06,845] [ INFO] - Loaded weights file from disk, setting weights to model. [2024-06-14 12:52:16,925] [ INFO] - All model checkpoint weights were used when initializing UIEM.

[2024-06-14 12:52:16,925] [ INFO] - All the weights of UIEM were initialized from the model checkpoint at /home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base. If your task is similar to the task the model of the checkpoint was trained on, you can already use UIEM for predictions without further training. [2024-06-14 12:52:16,940] [ INFO] - Converting to the inference model cost a little time. I0614 12:52:21.171234 1 interpretercore.cc:237] New Executor is Running. [2024-06-14 12:52:28,424] [ INFO] - The inference model save in the path:/home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base/static/inference E0614 12:52:28.425216 1 analysis_config.cc:630] Please compile with MKLDNN first to use MKLDNN [2024-06-14 12:52:29,551] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_m.tokenizer.ErnieMTokenizer'> to load '/home/ma-user/.paddlenlp/taskflow/information_extraction/uie-m-base'. 加载模型 120.71543025970459 download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /home/ma-user/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar 100%|██████████| 3.83M/3.83M [00:00<00:00, 13.9MiB/s] download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /home/ma-user/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar 100%|██████████| 11.9M/11.9M [00:00<00:00, 26.1MiB/s] download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /home/ma-user/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar 100%|██████████| 2.19M/2.19M [00:00<00:00, 9.26MiB/s] 信息提取 5840.636544704437 {"样本号": "10", "姓名": "邹福珍", "性别": "女", "年龄": "75岁", "就诊卡号": "0003570020", "住院号": "0969150", "科室": "老年内分泌科", "病区": "老年内分泌科", "床号": "25", "凝血酶原时间": "11. 10秒", "PT国际化标准化比值": "0.963", "活化部分凝血活酶时间": "21.20」秒", "D-二聚体": "2.99t", "申请医师": "狄文娟", "检验者": "狄文娟", "审核者": "第1页/共Y", "采 集时间": "2024-04-1809:49:04", "接收时间": "2024-04-1812:21:29", "报告时间": "2024-04-18", "检验门诊信息": "每周三上午"} 5.984306335449219e-05

推理也正常进行,但是极其慢,一张医院检测报告推理用了(信息提取 5840.636544704437s),请问这个怎么回事?是paddlepaddle,paddlenip,paddleocr的版本不对吗?还是其他配置有问题,请指导,谢谢!

AllenMeng2009 commented 2 weeks ago

难道paddlepaddle,paddlenip,paddleocr这3个要自己build armv8体系架构版本?还是说我推理运行时要指派具体NPU 用python -npu -device app.py等命令?目前我申请的cpu和npu资源如下: Limits: cpu: 4 huawei.com/ascend-1980: 1 memory: 32G Requests: cpu: 2 huawei.com/ascend-1980: 1 memory: 16G

AllenMeng2009 commented 2 weeks ago

CMD ["python", "/usr/src/app/medical_report_ocr.py"]中medical_report_ocr.py代码如下:

import paddlenlp, paddleocr import time import csv from pprint import pprint from paddlenlp import Taskflow

from paddlenlp_ov import Taskflow

print(paddlenlp.version) print(paddleocr.version)

start = time.time()

with open('schema.csv', 'r') as csvfile: reader = csv.reader(csvfile) column = [row[0] for row in reader] column = list(filter(None, column)) schema = column[1:len(column)] print('schema ', schema)

print('prompt ', time.time() - start)

start = time.time() print('检测 ', time.time() - start)

start = time.time() ie = Taskflow("information_extraction", schema=schema, model="uie-m-base", batch_size=1) #, batch_size=512, layout_analysis=True, predictor_type="openvino-inference", precision='fp32'

pprint(ie({"doc": "./20244.jpg"}))

print('加载模型 ', time.time() - start) start = time.time() k = ie({"doc": "./image/202401.jpg"})

print(k)

print('信息提取 ', time.time() - start)

start1 = time.time() data = {} for key, value in k[0].items():

print(key, value)

data[key] = value[0]['text']

s = str(data).replace("'", '"') print(s) print(time.time() - start1)

AllenMeng2009 commented 2 weeks ago

![Uploading 202401.jpg…]()

AllenMeng2009 commented 2 weeks ago

@guoshengCS 麻烦帮忙看看,多谢!

daytime25 commented 3 days ago

@AllenMeng2009 想问一下,在Taskflow加载了openvino有加速吗?

AllenMeng2009 commented 2 days ago

@daytime25 您好!没有用到openvino加速,请问如果用openvino加速,是直接pip install --upgrade --user openvino-dev,还需要paddlenlp_ov.zip包?(此包在哪里下载?),这时下面语句才会生效吧?多谢! my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best', predictor_type= "openvino-inference")

AllenMeng2009 commented 2 days ago

@daytime25 您好!我目前尝试pip install --upgrade --user openvino-dev进行了安装,并且在Taskflow中配置了predictor_type= "openvino-inference",发现没有效果,难道要下载paddlenlp_ov.zip?然后 from paddlenlp_ov import Taskflow my_ie = Taskflow("information_extraction", model="uie-x-base", schema=schema, task_path='./checkpoint/model_best', predictor_type= "openvino-inference") 这样才能生效吗?请指导,谢了!

daytime25 commented 1 day ago

@AllenMeng2009 需要下载paddlenlp_ov.zip,这个文件和原始的paddlenlp不一样代码改了,我这里报错ENABLE_TORCH_CHECKPOINT,就修改了model_utils.py里面的from paddlenlp.utils.env >> from paddlenlp_ov.utils.env,这里使用场景是intel的cpu能实现从耗时30多秒缩减到18秒。现在有个问题是部署成serving报错,看到帖子说是要改输出才行

AllenMeng2009 commented 7 hours ago

@daytime25,您好! paddlenlp_ov.zip在哪下载呢?openvino加速能对nvidia GPU有效吗?我后来切换到2张nvidia A800 80G的gpu了,目前加载uie-x-base模型要7s左右,推理一张医学检测报告要5-10s,还是很慢,有没有其他方案提速呢?谢谢