Closed tianleiwu closed 1 year ago
你好, 对于更高性能的stable-diffusion(SD)模型推理, 需要搭配使用paddle_tensorrt后端进行. 如 python text_to_img_infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu
这里推荐参考paddlenlp/ppdifusers/deploy下的示例代码进行SD模型导出和性能测试, 以达到最优的性能水平https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers/deploy
环境
【版本】: fastdeploy-gpu-python 1.0.5 paddle-bfloat 0.1.7 paddle2onnx 1.0.6 paddlefsl 1.1.0 paddlenlp 2.5.2 paddlepaddle-gpu 0.0.0.post117 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 python 3.10
【系统平台】: Linux x64(Ubuntu 20.04)
【硬件】: A100-40GB
问题日志及出现问题的操作流程
模型从https://bj.bcebos.com/fastdeploy/models/stable-diffusion/runwayml/stable-diffusion-v1-5.tgz 下载的
··· /FastDeploy/examples/multimodal/stable_diffusion$ python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle --inference_steps 50 [2023-04-04 07:24:35,868] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/vocab.json [2023-04-04 07:24:35,869] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/merges.txt [2023-04-04 07:24:35,869] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json and saved to /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14 [2023-04-04 07:24:36,849] [ WARNING] - filehttps://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json not exist [2023-04-04 07:24:36,849] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/special_tokens_map.json [2023-04-04 07:24:36,850] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/tokenizer_config.json [INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackendRuntime initialized with Backend::ORT in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. Spend 11.65 s to load unet model. Run the stable diffusion pipeline 1 times to test the performance. No 0 time cost: 4.535359 s Mean latency: 4.535359 s, p50 latency: 4.535359 s, p90 latency: 4.535359 s, p95 latency: 4.535359 s.
wget https://bj.bcebos.com/fastdeploy/models/stable-diffusion/CompVis/stable-diffusion-v1-4.tgz /FastDeploy/examples/multimodal/stable_diffusion$tar -xvzf stable-diffusion-v1-4.tgz /FastDeploy/examples/multimodal/stable_diffusion$python infer.py --model_dir stable-diffusion-v1-4/ --backend paddle --inference_steps 50 --use_fp16 1 --scheduler pndm --benchmark_steps 10 [2023-04-04 07:56:04,939] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/vocab.json [2023-04-04 07:56:04,939] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/merges.txt [2023-04-04 07:56:04,939] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json and saved to /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14 [2023-04-04 07:56:05,907] [ WARNING] - filehttps://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json not exist [2023-04-04 07:56:05,908] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/special_tokens_map.json [2023-04-04 07:56:05,908] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/tokenizer_config.json [INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackendRuntime initialized with Backend::ORT in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. wget https://bj.bcebos.com/fastdeploy/models/stable-diffusion/CompVis/stable-diffusion-v1-4.tgz[INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. Spend 11.83 s to load unet model. Run the stable diffusion pipeline 5 times to test the performance. No 0 time cost: 4.607649 s No 1 time cost: 4.613580 s No 2 time cost: 4.601662 s No 3 time cost: 4.602602 s No 4 time cost: 4.606175 s Mean latency: 4.606334 s, p50 latency: 4.606175 s, p90 latency: 4.611208 s, p95 latency: 4.612394 s. Image saved in fd_astronaut_rides_horse.png! ···
速度大概是4.5~4.6秒,比Blog (https://blog.csdn.net/PaddlePaddle/article/details/129426638) 上的0.76秒差太远。