PaddlePaddle / FastDeploy

⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end optimization, multi-platform and multi-framework support.
https://www.paddlepaddle.org.cn/fastdeploy
Apache License 2.0
2.98k stars 465 forks source link

无法重现Stable Diffusion推理速度 #1764

Closed tianleiwu closed 1 year ago

tianleiwu commented 1 year ago

环境

问题日志及出现问题的操作流程

模型从https://bj.bcebos.com/fastdeploy/models/stable-diffusion/runwayml/stable-diffusion-v1-5.tgz 下载的

··· /FastDeploy/examples/multimodal/stable_diffusion$ python infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle --inference_steps 50 [2023-04-04 07:24:35,868] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/vocab.json [2023-04-04 07:24:35,869] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/merges.txt [2023-04-04 07:24:35,869] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json and saved to /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14 [2023-04-04 07:24:36,849] [ WARNING] - filehttps://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json not exist [2023-04-04 07:24:36,849] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/special_tokens_map.json [2023-04-04 07:24:36,850] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/tokenizer_config.json [INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackendRuntime initialized with Backend::ORT in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. Spend 11.65 s to load unet model. Run the stable diffusion pipeline 1 times to test the performance. No 0 time cost: 4.535359 s Mean latency: 4.535359 s, p50 latency: 4.535359 s, p90 latency: 4.535359 s, p95 latency: 4.535359 s.

wget https://bj.bcebos.com/fastdeploy/models/stable-diffusion/CompVis/stable-diffusion-v1-4.tgz /FastDeploy/examples/multimodal/stable_diffusion$tar -xvzf stable-diffusion-v1-4.tgz /FastDeploy/examples/multimodal/stable_diffusion$python infer.py --model_dir stable-diffusion-v1-4/ --backend paddle --inference_steps 50 --use_fp16 1 --scheduler pndm --benchmark_steps 10 [2023-04-04 07:56:04,939] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/vocab.json [2023-04-04 07:56:04,939] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/merges.txt [2023-04-04 07:56:04,939] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json and saved to /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14 [2023-04-04 07:56:05,907] [ WARNING] - filehttps://bj.bcebos.com/paddlenlp/models/community/openai/clip-vit-large-patch14/added_tokens.json not exist [2023-04-04 07:56:05,908] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/special_tokens_map.json [2023-04-04 07:56:05,908] [ INFO] - Already cached /home/turinguser/.paddlenlp/models/openai/clip-vit-large-patch14/tokenizer_config.json [INFO] fastdeploy/runtime/runtime.cc(293)::CreateOrtBackendRuntime initialized with Backend::ORT in Device::GPU. [INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. wget https://bj.bcebos.com/fastdeploy/models/stable-diffusion/CompVis/stable-diffusion-v1-4.tgz[INFO] fastdeploy/runtime/runtime.cc(266)::CreatePaddleBackendRuntime initialized with Backend::PDINFER in Device::GPU. Spend 11.83 s to load unet model. Run the stable diffusion pipeline 5 times to test the performance. No 0 time cost: 4.607649 s No 1 time cost: 4.613580 s No 2 time cost: 4.601662 s No 3 time cost: 4.602602 s No 4 time cost: 4.606175 s Mean latency: 4.606334 s, p50 latency: 4.606175 s, p90 latency: 4.611208 s, p95 latency: 4.612394 s. Image saved in fd_astronaut_rides_horse.png! ···

速度大概是4.5~4.6秒,比Blog (https://blog.csdn.net/PaddlePaddle/article/details/129426638) 上的0.76秒差太远。

wwbitejotunn commented 1 year ago

你好, 对于更高性能的stable-diffusion(SD)模型推理, 需要搭配使用paddle_tensorrt后端进行. 如 python text_to_img_infer.py --model_dir stable-diffusion-v1-5/ --scheduler "euler_ancestral" --backend paddle_tensorrt --use_fp16 True --device gpu 这里推荐参考paddlenlp/ppdifusers/deploy下的示例代码进行SD模型导出和性能测试, 以达到最优的性能水平https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers/deploy