[Bug]: Massively slower generation and higher vram usage in RTX 2060 Mobile compared to original webui

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

I am experiencing deal-breaking performance differences between the original webui and this fork. Here is my setup:

Windows 10, laptop running 6 GB VRAM version of the RTX 2060, Ryzen 7 4800H processor, 32 GB RAM
The exact same config.json and ui-config.json file on both installations
Approximately the same extensions, Tiled VAE is turned off in both cases
The same cudnn files in venv/Lib/site-packages/torch/lib
Same torch version on both installations (1.13.1-cu117)
set COMMANDLINE_ARGS=--medvram --xformers --listen --no-half-vae in webui-user.bat
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512 in webui-user.bat

I ran the same image parameters on both clients multiple times. Results were:

stable-diffusion-webui:

Time taken: 9.09s (512x512)
Time taken: 1m 1.33s (2x hires fix 512x512)
Time taken: 1m 0.24s (2x hires fix 512x512)

stable-diffusion-webui-ux:

Time taken: 13.12s (512x512)
Time taken: 1m 50.29s (2x hires fix 512x512)
Time taken: 1m 37.05s (2x hires fix 512x512)

In one instance, stable-diffusion-webui-ux capped out the VRAM and got stuck for over 2 minutes trying to hires fix. Time taken: 3m 11.70s

Steps to reproduce the problem

Use described setup in the previous section
Open stable-diffusion-webui and Ctrl+F5 the page
Go to image browser, send the latest image parameters to txt2img
Generate an image to make sure everything is loaded
Generate the same image to benchmark, then the same image with hires fix to benchmark (Result A)
Open stable-diffusion-webui-ux and Ctrl+F5 the page
Repeat steps 2, 3 and 4 (Result B)

Result A: Generation takes 9-10 seconds, hires fix takes about 1:01 minute, VRAM jumps up and down at a healthy pace

Result B: Generation takes 12-13 seconds, hires fix takes about 1:40 minute, VRAM sometimes caps out at 6 GB VRAM and hires fix gets stuck for several minutes

What should have happened?

Generation time and vram usage should be the exact same within margin of error, because there should be no reason for one to use the graphics card differently from the other.

Commit where the problem happens

2ac62ce241396939c387a221e20b0a7a8c399b6f (stable-diffusion-webui-ux)

22bcc7be428c94e9408f589966c2040187245d81 (stable-diffusion-webui)

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

set COMMANDLINE_ARGS=--medvram --xformers --listen --no-half-vae --allow-code --enable-insecure-extension-access --ckpt-dir ..\\sd-models\\Checkpoint --vae-dir ..\\sd-models\\VAE --lora-dir ..\\sd-models\\Lora --embeddings-dir ..\\sd-models\\TI
set PYTORCH_CUDA_ALLOC_CONF=garbage_collection_threshold:0.9,max_split_size_mb:512

List of extensions

Extension | URL | Version | Update -- | -- | -- | -- Extension URL Version Update a1111-sd-webui-tagcomplete https://github.com/DominikDoom/a1111-sd-webui-tagcomplete.git 223abf54 (Wed Apr 5 11:05:44 2023) unknown multidiffusion-upscaler-for-automatic1111 https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111.git 70ca3c77 (Wed Apr 5 10:57:07 2023) unknown sd-dynamic-prompts https://github.com/adieyal/sd-dynamic-prompts.git b16480e3 (Wed Apr 12 08:30:59 2023) unknown sd-webui-controlnet https://github.com/Mikubill/sd-webui-controlnet.git e1885108 (Wed Apr 12 03:24:32 2023) unknown sd-webui-model-converter https://github.com/Akegarasu/sd-webui-model-converter.git d19e2816 (Sun Mar 26 06:36:49 2023) unknown stable-diffusion-webui-images-browser https://github.com/AlUlkesh/stable-diffusion-webui-images-browser.git 1d5c2e75 (Tue Mar 28 13:19:52 2023) unknown ultimate-upscale-for-automatic1111 https://github.com/Coyote-A/ultimate-upscale-for-automatic1111.git 0a3d03a4 (Tue Feb 7 06:07:23 2023) unknown LDSR [built-in](http://localhost:7860/) Lora [built-in](http://localhost:7860/) ScuNET [built-in](http://localhost:7860/) SwinIR [built-in](http://localhost:7860/) prompt-bracket-checker [built-in](http://localhost:7860/) sd_theme_editor [built-in](http://localhost:7860/) ### Console logs ```Shell stable-diffusion-webui: Already up to date. venv "D:\stable-diffusion-webui\venv\Scripts\Python.exe" Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Commit hash: 22bcc7be428c94e9408f589966c2040187245d81 Installing requirements for Web UI Launching Web UI with arguments: --medvram --xformers --listen --no-half-vae --allow-code --enable-insecure-extension-access --ckpt-dir ..\sd-models\Checkpoint --vae-dir ..\sd-models\VAE --lora-dir ..\sd-models\Lora --embeddings-dir ..\sd-models\TI Additional Network extension not installed, Only hijack built-in lora LoCon Extension hijack built-in lora successfully Loading weights [c4506f615d] from ..\sd-models\Checkpoint\7thHeavenMix.safetensors Creating model from config: D:\stable-diffusion-webui\configs\v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Loading VAE weights specified in settings: ..\sd-models\VAE\blessed2.vae.pt Applying xformers cross attention optimization. Textual inversion embeddings loaded(29): bad-hands-5, EasyNegative, ti-assassin-yor-forger, ti-marin-kitagawa, ti-power, ti-yor-forger, ti-arm-covering-breasts, ti-arms-covering-breasts, ti-boobjob, ti-bound-missionary, ti-cowgirl-position, ti-hands-covering-breasts, ti-oral-pov, ti-oral-sideview, ti-2-hoshimachi-suisei, ti-darknesss-laplus, ti-hoshimachi-suisei, ti-houshou-marine, ti-inugami-korone, ti-nekomata-okayu, ti-ninomae-inanis, ti-oozora-subaru, ti-sakamata-chloe, ti-shishiro-botan, ti-uruha-rushia-black, ti-uruha-rushia-pink, ti-uruha-rushia-school, ti-uruha-rushia, ti-shylily Model loaded in 2.5s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 0.7s, apply half(): 0.4s, load VAE: 0.4s). Running on local URL: http://0.0.0.0:7860 To create a public link, set `share=True` in `launch()`. Startup time: 19.2s (import torch: 2.0s, import gradio: 1.4s, import ldm: 0.7s, other imports: 1.1s, setup codeformer: 0.2s, load scripts: 2.2s, load SD checkpoint: 2.7s, create ui: 4.7s, gradio launch: 4.2s). locon load lora method 100%|██████████████████████████████████████████████| 30/30 [00:11<00:00, 2.51it/s] Total progress: 30it [00:25, 1.17it/s] 100%|██████████████████████████████████████████████| 30/30 [00:07<00:00, 4.12it/s] 100%|██████████████████████████████████████████████| 10/10 [00:16<00:00, 1.65s/it] Total progress: 100%|██████████████████████████████| 40/40 [01:05<00:00, 1.63s/it] stable-diffusion-webui-ux: venv "D:\stable-diffusion-webui-ux\venv\Scripts\Python.exe" Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Commit hash: 2ac62ce241396939c387a221e20b0a7a8c399b6f Installing requirements for Web UI Installing sd-dynamic-prompts requirements.txt Launching Web UI with arguments: --medvram --xformers --listen --no-half-vae --allow-code --enable-insecure-extension-access --ckpt-dir ..\sd-models\Checkpoint --vae-dir ..\sd-models\VAE --lora-dir ..\sd-models\Lora --embeddings-dir ..\sd-models\TI Loading weights [c4506f615d] from ..\sd-models\Checkpoint\7thHeavenMix.safetensors Creating model from config: D:\stable-diffusion-webui-ux\configs\v1-inference.yaml LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 859.52 M params. Loading VAE weights specified in settings: ..\sd-models\VAE\blessed2.vae.pt Applying xformers cross attention optimization. Textual inversion embeddings loaded(29): bad-hands-5, EasyNegative, ti-assassin-yor-forger, ti-marin-kitagawa, ti-power, ti-yor-forger, ti-arm-covering-breasts, ti-arms-covering-breasts, ti-boobjob, ti-bound-missionary, ti-cowgirl-position, ti-hands-covering-breasts, ti-oral-pov, ti-oral-sideview, ti-2-hoshimachi-suisei, ti-darknesss-laplus, ti-hoshimachi-suisei, ti-houshou-marine, ti-inugami-korone, ti-nekomata-okayu, ti-ninomae-inanis, ti-oozora-subaru, ti-sakamata-chloe, ti-shishiro-botan, ti-uruha-rushia-black, ti-uruha-rushia-pink, ti-uruha-rushia-school, ti-uruha-rushia, ti-shylily Model loaded in 2.0s (create model: 0.6s, apply weights to model: 0.6s, apply half(): 0.3s, load VAE: 0.3s). Running on local URL: http://0.0.0.0:7860 To create a public link, set `share=True` in `launch()`. Startup time: 18.5s (import gradio: 2.5s, import ldm: 0.9s, other imports: 1.8s, list extensions: 0.8s, load scripts: 2.0s, load SD checkpoint: 2.2s, create ui: 3.6s, gradio launch: 4.7s). 100%|██████████████████████████████████████████████| 30/30 [00:14<00:00, 2.13it/s] Total progress: 100%|██████████████████████████████| 30/30 [00:16<00:00, 1.81it/s] 100%|██████████████████████████████████████████████| 30/30 [00:10<00:00, 2.82it/s] Total progress: 100%|██████████████████████████████| 30/30 [00:11<00:00, 2.60it/s] 100%|██████████████████████████████████████████████| 30/30 [00:10<00:00, 2.82it/s] 100%|██████████████████████████████████████████████| 10/10 [02:15<00:00, 13.55s/it] Total progress: 100%|██████████████████████████████| 40/40 [03:13<00:00, 4.84s/it] Total progress: 100%|██████████████████████████████| 40/40 [03:13<00:00, 2.88s/it] ``` ### Additional information _No response_

anapnoe / stable-diffusion-webui-ux