AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
141.78k stars 26.78k forks source link

[Bug]: 1024x1024 SDXL use more than 12 GB VRAM during VAE part of inference - uses Shared VRAM makes it super slow #13341

Open FurkanGozukara opened 1 year ago

FurkanGozukara commented 1 year ago

VAE is being loaded onto shared VRAM - even half fp16 VAE is loaded onto shared VRAM - not using refiner only base SDXL 1.0 - 1024x1024

Here 0 GB VRAM usage when auto1111 is closed

image

Started with below arguments fresh install

@echo off
set PYTHON=
set GIT=
set VENV_DIR=
set XFORMERS_PACKAGE=xformers==0.0.21
set CUDA_VISIBLE_DEVICES=1
set COMMANDLINE_ARGS=--xformers --no-half-vae
call webui.bat

VAE decoding part uses shared VRAM which makes it super super slow

image

I don't think so this is expected

here settings

image

Using latest master branch

Chryseus commented 1 year ago

Same issue, it's like the refiner model (or base if just using that) is not being unloaded before VAE decode.

FurkanGozukara commented 1 year ago

--medvram fixing this issue but quality changes

Chryseus commented 1 year ago

--medvram fixing this issue but quality changes

It reduces it but doesn't fix it, I still go into shared memory on my 4060Ti 8GB which requires --medvram

FurkanGozukara commented 1 year ago

--medvram fixing this issue but quality changes

It reduces it but doesn't fix it, I still go into shared memory on my 4060Ti 8GB which requires --medvram

8 GB is just too low. try --lowvram but still might not work good