Subarasheese / sd-x4-wui

Stable Diffusion x4 upscaler - WebUI
46 stars 7 forks source link

Out of memory with image smaller than the one you said you did without tiling #5

Open jonathancolledge opened 8 months ago

jonathancolledge commented 8 months ago

Hi, I have a 3090 with 24 Gb VRAM and I tried a 1265 x 846 image and I got the below: (Of note installation was a bit tricky I had to use the fixes for long file lengths as per the other issues, but also, it could not find a matching torch to install so I had to use: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Is there something else I did wrong?

Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:01<00:00, 3.71it/s] Resizing image to a square... Determining background color... Background color is... (255, 255, 255, 255) Exporting image tile: image_0.png 0%| | 0/75 [00:14<?, ?it/s] Traceback (most recent call last): File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\routes.py", line 321, in run_predict output = await app.blocks.process_api( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\blocks.py", line 1015, in process_api result = await self.call_function(fn_index, inputs, iterator, request) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\gradio\blocks.py", line 856, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, args) File "C:\Users\jonat\sd-x4-wui\gradio_gui.py", line 8, in upscale_image output_image = upscaler.upscale_image(image, int(rows), int(cols),int(seed), prompt,negative_prompt,xformers_input,cpu_offload_input,attention_slicing_input,enable_custom_sliders,guidance,iterations) File "C:\Users\jonat\sd-x4-wui\upscaler.py", line 86, in upscale_image ups_tile = pipeline(prompt=prompt,negative_prompt=negative_prompt, image=x.convert("RGB"),generator=generator).images[0] File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_upscale.py", line 775, in call noise_pred = self.unet( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\accelerate\hooks.py", line 165, in new_forward output = module._old_forward(*args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\unet_2d_condition.py", line 1177, in forward sample = upsample_block( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 2354, in forward hidden_states = attn( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\transformer_2d.py", line 392, in forward hidden_states = block( File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\attention.py", line 393, in forward ff_output = self.ff(norm_hidden_states, scale=lora_scale) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\attention.py", line 665, in forward hidden_states = module(hidden_states, scale) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, *kwargs) File "C:\Users\jonat\sd-x4-wui\sdupx4\lib\site-packages\diffusers\models\activations.py", line 103, in forward return hidden_states self.gelu(gate) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.05 GiB. GPU 0 has a total capacty of 24.00 GiB of which 2.11 GiB is free. Of the allocated memory 20.28 GiB is allocated by PyTorch, and 46.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

jonathancolledge commented 8 months ago

I can't get it to install according to the instructions so I fiddled about. This is my latest install with conda where I am hoping everything installed ok and I get all the optimisations. Currently it is running on a 1024 x 684 image but I think it will run out of memory when it comes to the last step - saving the image. It is running along between 5 Gb and 18 Gb of VRAM in use:

git clone https://github.com/Subarasheese/sd-x4-wui

cd sd-x4-wui

conda create -n sdup python=3.10

git config --system core.longpaths true

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl

pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121

pip3 install accelerate

I edited requirements.txt to only have the following:

Pillow == 9.4.0 diffusers gradio == 3.15.0 split_image == 2.0.1 transformers

Then I ran with

python gradio_gui.py