[Bug]: UTC模型部署：Fastploy有错误，暂未解决 Segmentation fault

PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.

Apache License 2.0

3k stars 465 forks source link

软件环境

fastploy版本 1.0.7最新版
GPU版本
- paddlepaddle:2.4.2
- paddlepaddle-gpu: 11.2
- paddlenlp: 2.5.2
- cuda 11.2
- cudnn 8.1.1

除使用fastploy外，所有程序均正常

cudnn下载官网

fastploy官方环境要求:
CUDA >= 11.2
cuDNN >= 8.0
python >= 3.6
OS: Linux(x64)/Windows 10(x64)

paddle-gpu 要求
CUDA 工具包 10.2 配合 cuDNN v7.6.5, 如需使用 PaddleTensorRT 推理，需配合 TensorRT7.0.0.11
CUDA 工具包 11.2 配合 cuDNN v8.2.1, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.0.3.4
CUDA 工具包 11.6 配合 cuDNN v8.4.0, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.4.0.6
CUDA 工具包 11.7 配合 cuDNN v8.4.1, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.4.2.4
CUDA 工具包 11.8 配合 cuDNN v8.6.0, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.5.1.7
CUDA 工具包 12.0 配合 cuDNN v8.9.1, 如需使用 PaddleTensorRT 推理，需配合 TensorRT8.6.1.6

看过别的issue发现paddle-gpu貌似要求在cudnn 8.2.1上，但在官网上看到支持cuda 11.2的加速库版本只有<=8.1.1；8.2以上支持11.x 下载下来看都是cuda 11.3 不确定能否在paddle-gpu 11.2版本使用。目前官网上看到只有12.0 11.8 11.7 11.6 11.2以及10.2

Cpu版本
- paddlepaddle:2.4.2
- paddlenlp: 2.5.2

操作系统：centos python版本：3.8

重复问题

[X] I have searched the existing issues

错误描述

1.Fastploy部署时报错（仅报错）

Segmentation fault

检查了一下：在导入from paddlenlp.prompt import PromptDataCollatorWithPadding, UTCTemplate 这里出的错

但是在不适用部署情况下（fastploy方式）可以正常预测；同时使用serving方式也是可以正常预测的

稳定复现步骤 & 代码

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/zero_shot_text_classification

补充一下：在Windows系统下，没发现上述问题，可以正常执行，在linux centos下GPU V100 T4 以及CPU都会报错Segmentation fault

我也出现这种情况：https://github.com/PaddlePaddle/PaddleNLP/issues/6418 同样的出现的环境如下：

paddlenlp 2.7.2 paddlepaddle-gpu 2.6.0 fast-tokenizer-python 1.0.2 fastapi 0.110.0 fastdeploy-gpu-python 0.0.0 fastdeploy-tools 0.0.5 报错如下： I0524 15:02:55.838413 1151823 allocator_facade.cc:435] Set default stream to 0x143f19e0 for StreamSafeCUDAAllocator(0xdd50af0) in Place(gpu:0) I0524 15:02:55.838426 1151823 allocator_facade.cc:373] Get Allocator by passing in a default stream I0524 15:02:55.838486 1151823 gpu_info.cc:224] [cudaMalloc] size=0.00244141 MB, result=0 I0524 15:02:55.838553 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838563 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838572 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838580 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838587 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838647 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838654 1151823 gpu_info.cc:224] [cudaMalloc] size=0.000244141 MB, result=0 I0524 15:02:55.838665 1151823 gpu_info.cc:224] [cudaMalloc] size=0.0732422 MB, result=0 I0524 15:02:55.839088 1151823 gpu_info.cc:224] [cudaMalloc] size=0.0288086 MB, result=0 I0524 15:02:55.839371 1151823 gpu_info.cc:224] [cudaMalloc] size=0.0732422 MB, result=0 I0524 15:02:55.839381 1151823 gpu_info.cc:224] [cudaMalloc] size=0.219727 MB, result=0 I0524 15:02:55.857205 1151823 gpu_info.cc:224] [cudaMalloc] size=0.248535 MB, result=0 I0524 15:02:55.859779 1151823 gpu_info.cc:224] [cudaMalloc] size=0.292969 MB, result=0 I0524 15:02:55.860016 1151823 gpu_info.cc:224] [cudaMalloc] size=0.292969 MB, result=0 I0524 15:02:55.860302 1151823 gpu_info.cc:224] [cudaMalloc] size=0.248535 MB, result=0 I0524 15:02:55.861150 1151823 stats.h:79] HostMemoryStatReserved0: Update current_value with 12, after update, current value = 12 I0524 15:02:55.861167 1151823 stats.h:79] HostMemoryStatAllocated0: Update current_value with 12, after update, current value = 12 I0524 15:02:55.861202 1151823 stats.h:79] HostMemoryStatReserved0: Update current_value with 4, after update, current value = 16 I0524 15:02:55.861207 1151823 stats.h:79] HostMemoryStatAllocated0: Update current_value with 4, after update, current value = 16 I0524 15:02:55.861232 1151823 stats.h:79] HostMemoryStatReserved0: Update current_value with 4, after update, current value = 20 I0524 15:02:55.861235 1151823 stats.h:79] HostMemoryStatAllocated0: Update current_value with 4, after update, current value = 20 Segmentation fault (core dumped)

PaddlePaddle / FastDeploy