Closed liao02x closed 1 year ago
Same problem. I have a stable diffusion webui v1.4.0 instance on ec2 g5.2xlarge which runs very slow. It takes around 15 seconds of waiting before initializing the image generation. However, this problem is not experienced on my g4dn.xlarge instance of same webui version.
Same problem. I have a stable diffusion webui v1.4.0 instance on ec2 g5.2xlarge which runs very slow. It takes around 15 seconds of waiting before initializing the image generation. However, this problem is not experienced on my g4dn.xlarge instance of same webui version.
That's very interesting. I was using g5 instances since I started and I never tried other instance types. Let me try a g4dn and see how it works.
Same problem. I have a stable diffusion webui v1.4.0 instance on ec2 g5.2xlarge which runs very slow. It takes around 15 seconds of waiting before initializing the image generation. However, this problem is not experienced on my g4dn.xlarge instance of same webui version.
That's very interesting. I was using g5 instances since I started and I never tried other instance types. Let me try a g4dn and see how it works.
Tried on g4dn instance with same setup and the speed didn't get better. It's using a weaker GPU so the generation is slower than g5, and the webui is slower than diffusers (1.3it/s vs 3it/s)
logs from webui:
ubuntu@ip-172-31-89-24:~/stable-diffusion-webui$ ./webui.sh
################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################
################################################################
Running on ubuntu user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
################################################################
Launching launch.py...
################################################################
Using TCMalloc: libtcmalloc_minimal.so.4
Python 3.10.12 (main, Jun 7 2023, 12:45:35) [GCC 9.4.0]
Version: v1.4.0
Commit hash: 394ffa7b0a7fff3ec484bcd084e673a8b301ccc8
Installing requirements
Launching Web UI with arguments: --share --no-half --xformers --api --opt-sub-quad-attention --opt-channelslast --opt-sdp-no-mem-attention
Loading weights [dcd690123c] from /home/ubuntu/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.safetensors
preload_extensions_git_metadata for 7 extensions took 0.00s
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://c88ac7807323cc8c9e.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Startup time: 13.5s (import torch: 2.6s, import gradio: 1.8s, import ldm: 2.8s, other imports: 1.6s, load scripts: 0.5s, create ui: 0.6s, gradio launch: 3.5s, add APIs: 0.1s).
Creating model from config: /home/ubuntu/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/configs/stable-diffusion/v2-inference-v.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Applying attention optimization: xformers... done.
Textual inversion embeddings loaded(0):
Model loaded in 10.3s (load weights from disk: 0.9s, find config: 4.8s, create model: 0.2s, apply weights to model: 2.5s, apply channels_last: 0.4s, move model to device: 1.2s, calculate empty prompt: 0.3s).
100%|██████████████████████████████████████████████████████████████████████| 50/50 [01:08<00:00, 1.37s/it]
Total progress: 100%|██████████████████████████████████████████████████████| 50/50 [01:16<00:00, 1.54s/it]
100%|██████████████████████████████████████████████████████████████████████| 50/50 [01:05<00:00, 1.30s/it]
Total progress: 100%|██████████████████████████████████████████████████████| 50/50 [01:06<00:00, 1.32s/it]
100%|██████████████████████████████████████████████████████████████████████| 50/50 [01:07<00:00, 1.34s/it]
Total progress: 100%|██████████████████████████████████████████████████████| 50/50 [01:08<00:00, 1.36s/it]
Total progress: 100%|██████████████████████████████████████████████████████| 50/50 [01:08<00:00, 1.36s/it]
logs from API using diffusers:
(venv) ubuntu@ip-172-31-89-24:~/stable-diffusion-webui$ python3.10 server.py
INFO: Started server process [2578]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
100%|██████████████████████████████████████████████████████████████████████| 50/50 [00:16<00:00, 2.95it/s]
INFO: 127.0.0.1:50284 - "POST /design/ HTTP/1.1" 200 OK
100%|██████████████████████████████████████████████████████████████████████| 50/50 [00:16<00:00, 3.02it/s]
INFO: 127.0.0.1:60614 - "POST /design/ HTTP/1.1" 200 OK
Run again with just --share --api --xformers
--no-half
is mutually exclusive and negates possible speedup, from either xformers or opt-sdp-no-mem-attention. (either one or the other is active seen in console)
5. --opt-sub-quad-attention --opt-channelslast --opt-sdp-no-mem-attention
This is working. By taking --no-half
off the speed is close to diffusers generations. Thanks
@liao02x have you ever run into problem when you are using diffusers?
I am running on ec2 g5 instance, but it is very slow even I have already enable xformer
I have already use fp=16 for half precision.
@liao02x have you ever run into problem when you are using diffusers?
I am running on ec2 g5 instance, but it is very slow even I have already enable xformer
I have already use fp=16 for half precision.
I don't have issues with diffusers. I tested on both ami image with pytorch2 and ubuntu image with pytorch2, both of them work and the speed is ~7it/s with default settings.
Is there an existing issue for this?
What happened?
I was trying to set up the stable diffusion service api from an aws ec2 g5 instance which is using an a10g GPU. It went up correctly and generated the images. I used to set up the stable diffusion service from the same instance using diffusers and flask, and I was going to replace that with sd-webui. However, the generation speed went down a lot (7it/s vs 2 it/s).
From diffusers I used the default setup for stable diffusion 2.1. For sd-webui, I downloaded the checkpoint for stable diffusion 2.1 and imported it. The sampler is using DPM++ 2M Karras. The model and sampler should be the same choice and setup from diffusers.
A few things I tried:
--opt-sub-quad-attention --opt-channelslast --opt-sdp-no-mem-attention
. The speed doesn't change.I was wondering if anyone else is also running it on an aws instance and have the same issue. The only issue I found talked about problems with starting the service, which isn't my case.
Steps to reproduce the problem
What should have happened?
I would expect it to have a similar generation speed
Version or Commit where the problem happens
v1.4.0
What Python version are you running on ?
Python 3.10.x
What platforms do you use to access the UI ?
Linux
What device are you running WebUI on?
Nvidia GPUs (RTX 20 above)
Cross attention optimization
xformers
What browsers do you use to access the UI ?
Google Chrome
Command Line Arguments
List of extensions
No
Console logs
Additional information
No response