Fanghua-Yu / SUPIR

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
http://supir.xpixel.group/
Other
4.4k stars 385 forks source link

"1Torch was not compiled with flash attention" during inference #36

Open ThereforeGames opened 8 months ago

ThereforeGames commented 8 months ago

Hello,

Thank you for sharing SUPIR with us! I am trying to run it on Windows using a GeForce 3090, but I receive the following warning during inference:

Seed set to 754183752
[Tiled VAE]: input_size: torch.Size([1, 3, 1024, 1024]), tile_size: 512, padding: 32
[Tiled VAE]: split to 2x2 = 4 tiles. Optimal tile size 480x480, original tile size 512x512
[Tiled VAE]: Executing Encoder Task Queue: 100%|████████████████████████████████████| 364/364 [00:30<00:00, 12.11it/s]
[Tiled VAE]: Done in 31.141s, max VRAM alloc 35506.670 MB
[Tiled VAE]: input_size: torch.Size([1, 4, 128, 128]), tile_size: 64, padding: 11
[Tiled VAE]: split to 2x2 = 4 tiles. Optimal tile size 64x64, original tile size 64x64
[('conv_in', Conv2d(4, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))), ('store_res', <function resblock2task.<locals>.<lambda> at 0x0000023E057C3820>), ('pre_norm', GroupNorm(32, 512, eps=1e-06, affine=True)), ('silu', <function inplace_nonlinearity at 0x0000022E56463B80>), ('conv1', Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))), ('pre_norm', GroupNorm(32, 512, eps=1e-06, affine=True)), ('silu', <function inplace_nonlinearity at 0x0000022E56463B80>), ('conv2', Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))), ['add_res', None], ('store_res', <function attn2task.<locals>.<lambda> at 0x0000023E0581C700>), ('pre_norm', GroupNorm(32, 512, eps=1e-06, affine=True)), ('attn', <function attn2task.<locals>.<lambda> at 0x0000023E0468B3A0>), ['add_res', None]]
[Tiled VAE]: Executing Decoder Task Queue: 100%|████████████████████████████████████| 492/492 [00:49<00:00,  9.85it/s]
[Tiled VAE]: Done in 50.601s, max VRAM alloc 36130.516 MB
[Tiled VAE]: input_size: torch.Size([1, 3, 1024, 1024]), tile_size: 512, padding: 32
[Tiled VAE]: split to 2x2 = 4 tiles. Optimal tile size 480x480, original tile size 512x512
[Tiled VAE]: Executing Encoder Task Queue: 100%|████████████████████████████████████| 364/364 [00:19<00:00, 18.45it/s]
[Tiled VAE]: Done in 20.064s, max VRAM alloc 35518.795 MB
T:\programs\anaconda3\envs\SUPIR\lib\site-packages\torch\nn\functional.py:5476: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal)

Looking at my system resources, VRAM is still at 100%, so maybe I just need to be more patient. That said, has anyone else run into this warning or know if there's a simple fix?

I have --loading_half_params --use_tile_vae flags enabled.

Thank you.

EDIT: Can confirm that the upscale does work despite the warning. However, even with --use_8bit_llava it takes nearly 15 minutes to scale to 1x resolution. VRAM usage is reportedly ~23.3GB which, while technically within the limits of a 3090, is probably offloading to CPU given that other apps are using the GPU as well. But the good news is --no-llava lets me upscale a 512px image to 1024px in 40 seconds! Lowers VRAM requirements to 10.3 GB.

DmitryVN commented 8 months ago

I have the same problem: E:\SUPIR\venv\lib\site-packages\torch\nn\functional.py:5476: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal) How to fix it? thanks

YKefasu commented 8 months ago

https://github.com/Stability-AI/stablediffusion/issues/203

YKefasu commented 8 months ago

I have the same problem: E:\SUPIR\venv\lib\site-packages\torch\nn\functional.py:5476: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) attn_output = scaled_dot_product_attention(q, k, v, attn_mask, dropout_p, is_causal) How to fix it? thanks

CUDA 11.8

pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118 pip install -U xformers==0.0.22.post4