[Bug]: Heavy slowdown SDXL generation on RTX 4050 Laptop with WebUI 1.8.0 compare to WebUI 1.7.0

RuslanKuz commented 8 months ago

Checklist

[ ] The issue exists after disabling all extensions
[X] The issue exists on a clean installation of webui
[ ] The issue is caused by an extension, but I believe it is caused by a bug in the webui
[X] The issue exists in the current version of the webui
[X] The issue has not been reported before recently
[ ] The issue has been reported before but has not been fixed yet

What happened?

In general, I have a sad situation with the speed of SDXL generation with version 1.8 on a laptop with RTX 4050 6 GB. Generation 1024x1024 - 1:30-1:40 min with 30 steps. On version 1.7 it was 40-60 sec. This is with --xformers and --medvram-sdxl keys. At the same time on a home PC with RTX 3080 10 GB I didn't notice any special difference, maybe a little slower. It turns out that optimization for small video memory was not brought in for sure. Even more likely the opposite. I've installed clean setup v.1.7.0 and v.1.8.0 to test. Tested the same generation, ran several times. As a result, the best time:

Version 1.7.0 - 43.6 sec. Version 1.8.0 - 1.35 sec. Model - JuggernautXL 9. That's quite a downgrade. I noticed on monitoring that on 1.8 it loads video less. Probably because the memory is more clogged 1 7 0 1 8 0

Steps to reproduce the problem

start WebUI 2. set generation 3. press "Generate"

What should have happened?

WebUI should generate faster like previous version 1.7.0

What browsers do you use to access the UI ?

Microsoft Edge

Sysinfo

sysinfo-2024-03-10-12-40.json

Console logs

venv "D:\AI\Stable_Diffusion\Automatic_1111\venv\Scripts\Python.exe"
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.8.0
Commit hash: bef51aed032c0aaa5cfd80445bc4cf0d85b408b5
CUDA 12.1
Launching Web UI with arguments: --theme=dark --xformers --medvram-sdxl
Civitai Helper: Get Custom Model Folder
[-] ADetailer initialized. version: 24.3.0, num models: 10
ControlNet preprocessor location: D:\AI\Stable_Diffusion\Automatic_1111\extensions\sd-webui-controlnet\annotator\downloads
2024-03-10 15:44:58,281 - ControlNet - INFO - ControlNet v1.1.441
2024-03-10 15:44:58,360 - ControlNet - INFO - ControlNet v1.1.441
sd-webui-prompt-all-in-one background API service started successfully.
15:44:58 - ReActor - STATUS - Running v0.7.0-b6 on Device: CUDA
Loading weights [c9e3e68f89] from D:\AI\Stable_Diffusion\Automatic_1111\models\Stable-diffusion\SDXL--\juggernautXL_v9Rundiffusionphoto2.safetensors
Creating model from config: D:\AI\Stable_Diffusion\Automatic_1111\repositories\generative-models\configs\inference\sd_xl_base.yaml
2024-03-10 15:45:00,167 - ControlNet - INFO - ControlNet UI callback registered.
Civitai Helper: Set Proxy:
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 27.4s (prepare environment: 13.2s, import torch: 2.8s, import gradio: 0.7s, setup paths: 1.2s, initialize shared: 0.3s, other imports: 0.4s, load scripts: 3.2s, create ui: 2.3s, gradio launch: 2.4s, app_started_callback: 0.8s).
Applying attention optimization: xformers... done.
Model loaded in 7.6s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 4.6s, apply half(): 0.2s, calculate empty prompt: 1.7s).
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:23<00:00,  2.77s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [01:41<00:00,  3.39s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:20<00:00,  2.67s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [01:34<00:00,  3.16s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 30/30 [01:34<00:00,  2.67s/it]

Additional information

No any changes in PC hardware. Just upgraded a new version of WebUI. Previous version is working faster right now with this setup

Takezo1000 commented 8 months ago

When I use SDXL loras and models I noticed that my SSD has many more writes than usual, and out of VRAM happens more frequently.

I have 12 GB VRAM and 32 GB DDR4 RAM, so to get to the point of using my SSD it is probably requring a lot of VRAM and memory.

DHG-Dav commented 6 months ago

same issue here, the last versions really slowed everything down for me, with a RTX3060 12Gb & 64Gb ddr. It's also become much more CPU intensive, and there is a weird pause at the beginning and end of every image generation, that is most often longer than the image generation itself (like 20 seconds to generate an image, with 15 seconds pause before it begins and after loading model and 10 seconds pause before writing the image to the disk after it finishes... why ?). Also the maximum size of images i can generate without OOM have gone down from 2200x2200px to 1700x1700px (more or less), and i need to restart webUI and browser every now and then because of memory leaks... Let's hope v2 will focus on optimization rather than adding more stuff

aphix commented 6 months ago

Do you have a paging file active? Try disabling that and see if there's less SSD churn, hopefully things should speed up too. Paging files are on by default in windows and essentially allow the OS to use some of the hard drive as RAM, which is generally a bad thing if you have enough hardware RAM available (32GB is a good start). The slowdown can be caused by context switching, dumping RAM to disk, and then reading it back into RAM, since RAM is much faster than the hard drive (no matter the hard drive).

2u843yt385592yjh commented 5 months ago

Same issue on 1.9.4 Hard drive spikes to 50% usage, specifically when sdxl loras are loaded / unloaded. Once loaded (after 1 image generated and a few minutes), the performance stabilizes again, at least for me.

AUTOMATIC1111 / stable-diffusion-webui