Closed StylusEcho closed 1 month ago
Hi there, thanks for the report. Did you try running with -vae-in-fp16 or --vae-in-bf16? If not specifying, now it defaults to float32 (for more compatibility with other devices), if having a NVIDIA GPU (you seem to have a RTX 3080, so this should work)
Can you try with either these flags?
Can confirm, https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/fd78b06bf17a92fa3c96e17994c442b96409f77a is ~20% slower than https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/5a53f0364b301671e783c32535ae048391b48b4b in my default settings, -vae-in-fp16 or --vae-in-bf16 flags help a little with the tiling at the end, but the generation speed still suffers
Okay, pushed a new commit doing a lot of reversion from that update
Can you try if the performance is as expected now?
Did try checkout https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/6632a9dc6f734677e30a24929d7f5a7057687996, but got some errors in the cmd (with and without any flags):
Version: f0.0.20.1dev-v1.10.0RC-latest-828-g6632a9dc
Commit hash: 6632a9dc6f734677e30a24929d7f5a7057687996
Traceback (most recent call last):
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\launch.py", line 51, in <module>
main()
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\launch.py", line 47, in main
start()
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\launch_utils.py", line 542, in start
import webui
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\webui.py", line 19, in <module>
initialize.imports()
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\initialize.py", line 53, in imports
from modules import processing, gradio_extensons, ui # noqa: F401
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\processing.py", line 18, in <module>
import modules.sd_hijack
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_hijack.py", line 5, in <module>
from modules import devices, sd_hijack_optimizations, shared, script_callbacks, errors, sd_unet, patches
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_hijack_optimizations.py", line 13, in <module>
from modules.hypernetworks import hypernetwork
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\hypernetworks\hypernetwork.py", line 8, in <module>
import modules.textual_inversion.dataset
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\textual_inversion\dataset.py", line 12, in <module>
from modules import devices, shared, images
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\images.py", line 24, in <module>
from modules import sd_samplers, shared, script_callbacks, errors
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_samplers.py", line 5, in <module>
from modules import sd_samplers_kdiffusion, sd_samplers_timesteps, sd_samplers_lcm, shared, sd_samplers_common, sd_schedulers
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_samplers_kdiffusion.py", line 4, in <module>
from modules import sd_samplers_common, sd_samplers_extra, sd_samplers_cfg_denoiser, sd_schedulers
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_samplers_common.py", line 6, in <module>
from modules import devices, images, sd_vae_approx, sd_samplers, sd_vae_taesd, shared, sd_models
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules\sd_models.py", line 19, in <module>
from modules_forge import forge_loader
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\modules_forge\forge_loader.py", line 5, in <module>
from ldm_patched.modules import model_detection
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\ldm_patched\modules\model_detection.py", line 5, in <module>
import ldm_patched.modules.supported_models
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\ldm_patched\modules\supported_models.py", line 5, in <module>
from . import model_base
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\ldm_patched\modules\model_base.py", line 6, in <module>
from ldm_patched.ldm.modules.diffusionmodules.openaimodel import UNetModel, Timestep
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\ldm_patched\ldm\modules\diffusionmodules\openaimodel.py", line 23, in <module>
from ..attention import SpatialTransformer, SpatialVideoTransformer, default
File "C:\Users\cordm\Desktop\stable-diffusion-webui-reForge\ldm_patched\ldm\modules\attention.py", line 30, in <module>
FORCE_UPCAST_ATTENTION_DTYPE = model_management.force_upcast_attention_dtype()
AttributeError: module 'ldm_patched.modules.model_management' has no attribute 'force_upcast_attention_dtype'
Okay, give me some minutes. I will probably revert last and all that commit that reduced performance.
Okay, I have pushed a commit that gets the state of the model management and sd basically as it was before the problematic update
Can you guys check if it's ok now?
https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/e13096344c4b361163f6a85694103a667e791c87 works for me now as intended
Perfect. Closing the issue now. Even then, I will have to wonder how to apply some sd.py changes from comfy upstream into this branch, since it will be needed to support new models.
Hi there guys, I have again tried to update sd.py (to maybe support newer models)
Can you try if with this one you get bar performance on SDXL/Hi-res? Not on my case, but maybe my workflow is different.
https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/2403d91983bd4b22ad7cb2b721a35bcb159a4328 https://github.com/Panchovix/stable-diffusion-webui-reForge/commit/ae432b18deb61f3b97270eb6a53cacaab95e3d3c
Seems fine on my end.
Perfect, thanks for the confirmation.
So it is factible that the performance degradation came from some changes on VAE is managed actually on comfy (which aren't on reforge)
Checklist
What happened?
Since the above commit on dev_upstream, at high resolutions (eg. 2048x2048) with SDXL, generation speed seems significantly worse, and it runs out of memory for regular VAE decoding and has to fall back on tiled decoding.
When I try the commit right before that one, these issues are gone.
I looked at the console and one of the issues seems to be that AutoencoderKL is now using double the memory as it used to, unless i'm reading it wrong. For the hires fix pass denoising at 2048x2048 the iteration speed was 1.75s/it vs 3.19s/it now. The overall speed of the job went down from 1.83s/it to 5.89s/it! Holy moley...
Steps to reproduce the problem
What should have happened?
Should see some improvement to iteration speed and better VRAM usage, not worse.
What browsers do you use to access the UI ?
Mozilla Firefox, Microsoft Edge
Sysinfo
sysinfo.txt
Console logs
Additional information
Test generation had ADetailer turned on. Have recently enabled the --cuda-stream and --pin-shared-memory flags but issue was already happening prior to that.