AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
139.11k stars 26.4k forks source link

[Bug]: Unconstant gen speeds #15862

Open CasanovaSan opened 3 months ago

CasanovaSan commented 3 months ago

Checklist

What happened?

When generating an image on my system using sdxl(ponyxl) on a resolution of 880x1176 + adetailer takes around 1 min. 15.5 sec. sometimes this number goes to 3 minutes out of the blue for some unknown reason, idk why this happens or how it happens. This especially happens when i also do Highres as well and it sky rockets from 3 minutes (880x1176 + adetailer + highres) up to 8 minutes!

I think this might be related to FP8 since i have it enabled because my gpu is only 4vram, and having fp8 makes sdxl usable on 4vram

image image image

Steps to reproduce the problem

A nurmal usage of A1111

What should have happened?

Have a constant gen speed of 1 minute and ~15 seconds

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2024-05-22-08-18.json

Console logs

🤯 LobeTheme: Initializing...
Startup time: 38.1s (prepare environment: 11.5s, import torch: 8.0s, import gradio: 3.4s, setup paths: 6.1s, initialize shared: 0.4s, other imports: 2.2s, list SD models: 0.2s, load scripts: 4.8s, initialize extra networks: 0.1s, create ui: 1.0s, gradio launch: 0.5s, app_started_callback: 0.2s).
Loading VAE weights specified in settings: C:\Users\luisg\OneDrive\Documents\AI\stable-diffusion-webui\models\VAE\sdxl_vae.safetensors
Applying attention optimization: xformers... done.
Model loaded in 33.2s (load weights from disk: 1.3s, create model: 1.0s, apply weights to model: 17.2s, apply channels_last: 0.8s, apply half(): 0.1s, apply fp8: 10.1s, load VAE: 0.9s, hijack: 0.1s, load textual inversion embeddings: 0.2s, calculate empty prompt: 1.4s).
Reusing loaded model DominusMedusa.fp16.safetensors [a2b72a204e] to load ponyDiffusionV6XL_v6StartWithThisOne.safetensors [67ab2fd8ec]
Loading weights [67ab2fd8ec] from C:\Users\luisg\OneDrive\Documents\AI\stable-diffusion-webui\models\Stable-diffusion\ponyDiffusionV6XL_v6StartWithThisOne.safetensors
Loading VAE weights from user metadata: C:\Users\luisg\OneDrive\Documents\AI\stable-diffusion-webui\models\VAE\sdxl_vae.safetensors
Applying attention optimization: xformers... done.
Weights loaded in 43.6s (send model to cpu: 0.5s, load weights from disk: 0.4s, apply weights to model: 30.3s, apply fp8: 11.3s, load VAE: 1.0s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:36<00:00,  1.84s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.73s/it]
0: 640x480 1 face, 137.3ms
Speed: 16.5ms preprocess, 137.3ms inference, 16.2ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:16<00:00,  1.86s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:19<00:00,  3.96s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:40<00:00,  2.05s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.73s/it]
0: 640x480 1 face, 194.5ms
Speed: 42.5ms preprocess, 194.5ms inference, 13.4ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:19<00:00,  2.13s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:47<00:00,  5.35s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:36<00:00,  1.81s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.73s/it]
0: 640x480 1 face, 145.6ms
Speed: 16.0ms preprocess, 145.6ms inference, 9.3ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:16<00:00,  1.87s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:21<00:00,  4.09s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:33<00:00,  1.65s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:29<00:00,  1.57s/it]
0: 640x480 1 face, 89.6ms
Speed: 4.4ms preprocess, 89.6ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:15<00:00,  1.68s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:11<00:00,  3.57s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.61s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:29<00:00,  1.56s/it]
0: 640x480 1 face, 73.8ms
Speed: 4.5ms preprocess, 73.8ms inference, 1.0ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:15<00:00,  1.72s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:23<00:00,  4.16s/it]
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:35<00:00,  1.78s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:32<00:00,  1.73s/it]
0: 640x480 1 face, 133.1ms
Speed: 17.8ms preprocess, 133.1ms inference, 9.3ms postprocess per image at shape (1, 3, 640, 480)
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:16<00:00,  1.81s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:34<00:00,  4.71s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [01:34<00:00,  1.73s/it]

Additional information

No response

Mr-ex777 commented 2 months ago

Happens with me with SDXL as well, seems like it becomes worse for each pic unless I restart pc. Running on 4060 with 16GB vram