UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)

HVision-NKU / StoryDiffusion

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Apache License 2.0

5.83k stars 582 forks source link

UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) #26

Open huanxve opened 5 months ago

huanxve commented 5 months ago

UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)

huanxve commented 5 months ago

Does this error affect the generation speed?

Bigfield77 commented 5 months ago

When i looked for the same error, it seems that torch (on windows at least) has had that issue since the last few release.

The last release where this works is torch 2.1.2: pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 xformers --index-url https://download.pytorch.org/whl/cu121 --upgrade

optional for window user: ( pip install https://huggingface.co/r4ziel/xformers_pre_built/resolve/main/triton-2.0.0-cp310-cp310-win_amd64.whl)

Bigfield77 commented 5 months ago

it makes a huge difference in memory usage only if xformers_memory attention is enabled in line 531 of gradio_app_sdxl_specific_id.py.
pipe.enable_xformers_memory_efficient_attention()

But unfortunately it seems to break the attention mechanism of storydiffusion :(