comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
GNU General Public License v3.0
40.96k stars 4.36k forks source link

When deploying ComfyUI on a fresh Windows installation using Miniconda, I encountered the "1Torch was not compiled with flash attention" warning during the initial inference. #3265

Open wibur0620 opened 2 months ago

wibur0620 commented 2 months ago

I have adopted a fresh installation, encountering the same issue. I've already spent three days trying to resolve it. So far, none of the methods I've tried have worked, and I also feel like the speed when using the sdxl model is not as fast as before (this might be my perception). In order to address this warning, I have switched the CUDA version in the system to 12.1 and tried different versions of Torch, but the warning persists. I want to know if this has any negative impact on my use of ComfyUI?

` D:\AI\ComfyUI>call conda activate D:\AI\ComfyUI\venv-comfyui Total VRAM 8188 MB, total RAM 65268 MB xformers version: 0.0.25.post1 Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync VAE dtype: torch.bfloat16 Using xformers cross attention

Import times for custom nodes: 0.0 seconds: D:\AI\ComfyUI\custom_nodes\websocket_image_save.py

Starting server

To see the GUI go to: http://127.0.0.1:8188 got prompt model_type EPS Using xformers attention in VAE Using xformers attention in VAE clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight'] Requested to load SDXLClipModel Loading 1 new model D:\AI\ComfyUI\comfy\ldm\modules\attention.py:345: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load SDXL Loading 1 new model 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:12<00:00, 1.65it/s] Requested to load AutoencoderKL Loading 1 new model Prompt executed in 20.37 seconds`

wibur0620 commented 2 months ago

After reinstalling and installing 50 plugins, I noticed that it takes a long time to load the model when using the ComfyUI_VLM_nodes node. Then I saw this warning, so I reinstalled comfyui without installing any nodes and found that the warning still exists. To resolve this warning, I even reinstalled the entire Windows system, but the warning persists.

NeedsMoar commented 2 months ago

Try installing the flash-attention-2.3.6 py311 ada / sm_89 (not xformers) wheel from my link on the discussions page you posted on (yeah it's from last December, doesn't seem to matter). xformers builds flash-attn-2 in-tree instead of as a dependency and tosses it in an anonymously named .pyd file that nothing else can use. Torch (seems) to look for one in the base install via lazy module loading; the error is just misleading. I think I'm still using 2.3.6 because I built a newer version like 2.4.2 around february and tried installing it and something started complaining but xformers is on v2.5.6 now so a newer build might work unless torch requires a specific version.

How old is your comfy version exactly? Unless you either pass --disable-xformers or --use-pytorch-cross-attention (which do the same thing more or less) to comfy with xformers 0.0.25 installed that line of code that errored calling torch.nn..... should never execute, it's a version specific thing for old xformers otherwise. The tricksy part is, unless you specifically run one of those two options, you'll still get that version line for xformers and messed up settings. Ooops... errr

It looks like there's a bug in the logic in model_management.py in comfy that's causing pytorch flash attention to be selected on nvidia @comfyanonymous I've been testing performance stuff (enabling medium precision matmul and torch.autocast globally, since that's a thing now apparently) so my line numbers are all wrong (hence not listing them) but it's a combo of this:

try:
    if is_nvidia():
        torch_version = torch.version.__version__
        if int(torch_version[0]) >= 2:
            if ENABLE_PYTORCH_ATTENTION == False and args.use_split_cross_attention == False and args.use_quad_cross_attention == False:
                ENABLE_PYTORCH_ATTENTION = True

and this:

if ENABLE_PYTORCH_ATTENTION:
    torch.backends.cuda.enable_math_sdp(True)
    torch.backends.cuda.enable_flash_sdp(True)
    torch.backends.cuda.enable_mem_efficient_sdp(True)

Except this happens kinda later and all the logic earlier in the file to turn off xformers if pytorch was enabled in the args (and vice versa) gets tossed and you get a mix of the two being enabled. I think this has been going on with my system for a while now so I'm patching that out to check.

As far as flash attention, the reason it isn't included, btw, is that the author of it can't figure out how to write a makefile that doesn't require CPU cores * 4GB peak ram for compilation without changing environment variables. This has almost nothing to do with Windows but it means unbreaking their setup.py every time it changes because the "fix" of building it on 1 CPU core is more broken than the original situation and IIRC it always forces builds of sm_80/compute_80 and sm_90/compute_90 whether you want them or not. Most people don't want them because they don't have $40k Hoppers laying around, and ada is covered poorly in that combo.

I think they've fixed it now in both torch and xformers but I've dumped other things in torch/lib that weren't default on Windows to make functionality work, like 2:4 sparsity via CUBlasLT, and somebody made a Triton v3.0.0 wheel so that can be installed to get all available functionality enabled in xformers aside from inter-GPU communications which torch fixed in the next version. Torch shows that Inductor is available with Triton installed. It's an annoying system since nobody seems to want to maintain what amounts to a makefile in most cases for some of these projects, but you can either build or find everything laying around

NeedsMoar commented 2 months ago

Yeah that was forcing it to use pytorch attention which was using the installed flash-attention-2 on my system via Torch's lazy-as-hell loading mechanisms where flash isn't even checked for when it's enabled, only when it's called and doesn't exist (so I didn't notice it). 2048x2048 straight image refinement is back to being slightly faster than using hypertiling to do it again with SDXL, and I gained .75it/s on 1024x1024 (for SDXL). Hypertiling is still faster when generating oddball sizes, but that's a deficiency of having a bunch of preset-sized kernels in flash-attn and is easy to work around, and it's accelerating the hypertiling batches as well (which are more well behaved if they're power-of-two sizes and the image isn't) so it's hard to get a metric.

Edit: Presumably something somewhere else was bypassing xformers in the logic. Torch can use flash attention but doesn't have as advanced of kernel selection logic AFAICT which is the only reason it's any slower when they use the same code. Calling something like basic memory efficient in xformers is more like a factory call that picks a function from a larger set. I think torch was working towards doing something like that. The logic to use xformers for VAE is different so it gets used there no matter what. I don't have time to go over all the code right now but something is funky about the selection somewhere. Not to mention that custom nodes tend to just ignore model management entirely and call the torch version or their own version of quad no matter what and don't get any benefit, but that's a fix that needs to be made on their end.

wibur0620 commented 2 months ago

Yeah that was forcing it to use pytorch attention which was using the installed flash-attention-2 on my system via Torch's lazy-as-hell loading mechanisms where flash isn't even checked for when it's enabled, only when it's called and doesn't exist (so I didn't notice it). 2048x2048 straight image refinement is back to being slightly faster than using hypertiling to do it again with SDXL, and I gained .75it/s on 1024x1024 (for SDXL). Hypertiling is still faster when generating oddball sizes, but that's a deficiency of having a bunch of preset-sized kernels in flash-attn and is easy to work around, and it's accelerating the hypertiling batches as well (which are more well behaved if they're power-of-two sizes and the image isn't) so it's hard to get a metric.

Edit: Presumably something somewhere else was bypassing xformers in the logic. Torch can use flash attention but doesn't have as advanced of kernel selection logic AFAICT which is the only reason it's any slower when they use the same code. Calling something like basic memory efficient in xformers is more like a factory call that picks a function from a larger set. I think torch was working towards doing something like that. The logic to use xformers for VAE is different so it gets used there no matter what. I don't have time to go over all the code right now but something is funky about the selection somewhere. Not to mention that custom nodes tend to just ignore model management entirely and call the torch version or their own version of quad no matter what and don't get any benefit, but that's a fix that needs to be made on their end.

After giving up on solving a problem I couldn't resolve, I installed xformers. Prior to this, I only used the command pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 to install in a virtual environment created by Miniconda. When I opened ComfyUI for the first time, the warning "When deploying ComfyUI on a fresh Windows installation using Miniconda, I encountered the '1Torch was not compiled with flash attention' warning during the initial inference. #3265" appeared. Now I have no idea how to address this issue, nor do I know if this warning is due to my incorrect installation, or for some other reason, or if this warning is not caused by me and I just need to wait for the community to fix it.

NeedsMoar commented 2 months ago

Try downloading and installing this : https://github.com/NeedsMoar/flash-attention-2-builds/releases/download/v2.3.6-flash-attention/flash_attn-2.3.6-cp311-cp311-win_amd64.whl It's old but I have it installed and have never seen that message. If it breaks something even more just remove it. I'm pretty sure torch is able to use it if needed.

wibur0620 commented 2 months ago

Try downloading and installing this : https://github.com/NeedsMoar/flash-attention-2-builds/releases/download/v2.3.6-flash-attention/flash_attn-2.3.6-cp311-cp311-win_amd64.whl It's old but I have it installed and have never seen that message. If it breaks something even more just remove it. I'm pretty sure torch is able to use it if needed.

Okay, I'll try. Thank you.

wibur0620 commented 2 months ago

Try downloading and installing this : https://github.com/NeedsMoar/flash-attention-2-builds/releases/download/v2.3.6-flash-attention/flash_attn-2.3.6-cp311-cp311-win_amd64.whl It's old but I have it installed and have never seen that message. If it breaks something even more just remove it. I'm pretty sure torch is able to use it if needed.

But I'm still filled with doubt as to why I'm getting this warning with a fresh install of comfyui

wibur0620 commented 2 months ago

Try downloading and installing this : https://github.com/NeedsMoar/flash-attention-2-builds/releases/download/v2.3.6-flash-attention/flash_attn-2.3.6-cp311-cp311-win_amd64.whl It's old but I have it installed and have never seen that message. If it breaks something even more just remove it. I'm pretty sure torch is able to use it if needed. When I install and launch ComfyUI, I encounter an error D:\AI\ComfyUI>conda activate D:\AI\ComfyUI\venv-comfyui

(D:\AI\ComfyUI\venv-comfyui) D:\AI\ComfyUI> pip uninstall xformers Found existing installation: xformers 0.0.25.post1 Uninstalling xformers-0.0.25.post1: Would remove: d:\ai\comfyui\venv-comfyui\lib\site-packages\xformers-0.0.25.post1.dist-info* d:\ai\comfyui\venv-comfyui\lib\site-packages\xformers* Proceed (Y/n)? y Successfully uninstalled xformers-0.0.25.post1

(D:\AI\ComfyUI\venv-comfyui) D:\AI\ComfyUI> pip install C:\Users\liao1\Downloads\flash_attn-2.3.6-cp311-cp311-win_amd64.whl Processing c:\users\liao1\downloads\flash_attn-2.3.6-cp311-cp311-win_amd64.whl Requirement already satisfied: torch in d:\ai\comfyui\venv-comfyui\lib\site-packages (from flash-attn==2.3.6) (2.2.2+cu121) Requirement already satisfied: einops in d:\ai\comfyui\venv-comfyui\lib\site-packages (from flash-attn==2.3.6) (0.7.0) Requirement already satisfied: packaging in d:\ai\comfyui\venv-comfyui\lib\site-packages (from flash-attn==2.3.6) (24.0) Collecting ninja (from flash-attn==2.3.6) Using cached ninja-1.11.1.1-py2.py3-none-win_amd64.whl.metadata (5.4 kB) Requirement already satisfied: filelock in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (3.13.4) Requirement already satisfied: typing-extensions>=4.8.0 in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (4.11.0) Requirement already satisfied: sympy in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (1.12) Requirement already satisfied: networkx in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (3.3) Requirement already satisfied: jinja2 in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (3.1.3) Requirement already satisfied: fsspec in d:\ai\comfyui\venv-comfyui\lib\site-packages (from torch->flash-attn==2.3.6) (2024.3.1) Requirement already satisfied: MarkupSafe>=2.0 in d:\ai\comfyui\venv-comfyui\lib\site-packages (from jinja2->torch->flash-attn==2.3.6) (2.1.5) Requirement already satisfied: mpmath>=0.19 in d:\ai\comfyui\venv-comfyui\lib\site-packages (from sympy->torch->flash-attn==2.3.6) (1.3.0) Using cached ninja-1.11.1.1-py2.py3-none-win_amd64.whl (312 kB) Installing collected packages: ninja, flash-attn Successfully installed flash-attn-2.3.6 ninja-1.11.1.1

(D:\AI\ComfyUI\venv-comfyui) D:\AI\ComfyUI>

D:\AI\ComfyUI>call conda activate D:\AI\ComfyUI\venv-comfyui ComfyUI startup time: 2024-04-15 09:04:04.648713 Platform: Windows Python version: 3.11.8 | packaged by Anaconda, Inc. | (main, Feb 26 2024, 21:34:05) [MSC v.1916 64 bit (AMD64)] Python executable: D:\AI\ComfyUI\venv-comfyui\python.exe ** Log path: D:\AI\ComfyUI\comfyui.log

Prestartup times for custom nodes: 0.2 seconds: D:\AI\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 8188 MB, total RAM 65268 MB Set vram state to: NORMAL_VRAM Device: cuda:0 NVIDIA GeForce RTX 4060 Laptop GPU : cudaMallocAsync VAE dtype: torch.bfloat16 Using pytorch cross attention Traceback (most recent call last): File "D:\AI\ComfyUI\nodes.py", line 1864, in load_custom_node module_spec.loader.exec_module(module) File "", line 940, in exec_module File "", line 241, in _call_with_frames_removed File "D:\AI\ComfyUI\comfy_extras\nodes_canny.py", line 8, in from kornia.filters import canny File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia__init.py", line 8, in from . import augmentation, color, contrib, core, enhance, feature, io, losses, metrics, morphology, tracking, utils, x File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation__init.py", line 2, in from kornia.augmentation import auto, container File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto__init.py", line 1, in from .autoaugment import AutoAugment File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\autoaugment__init__.py", line 1, in from .autoaugment import AutoAugment File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\autoaugment\autoaugment.py", line 5, in from kornia.augmentation.auto.base import SUBPOLICY_CONFIG, PolicyAugmentBase File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\base.py", line 5, in from kornia.augmentation.auto.operations.base import OperationBase File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\operations__init.py", line 3, in from .policy import PolicySequential File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\operations\policy.py", line 7, in from kornia.augmentation.container.base import ImageSequentialBase, TransformMatrixMinIn File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\container\init__.py", line 1, in from kornia.augmentation.container.augment import AugmentationSequential File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\container\augment.py", line 4, in from kornia.augmentation._2d.base import RigidAffineAugmentationBase2D File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\init__.py", line 2, in from kornia.augmentation._2d.intensity import * File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\intensity\init__.py", line 28, in from kornia.augmentation._2d.intensity.plasma import RandomPlasmaBrightness, RandomPlasmaContrast, RandomPlasmaShadow File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\intensity\plasma.py", line 5, in from kornia.contrib import diamond_square File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\contrib\init.py", line 15, in from .image_stitching import ImageStitcher File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\contrib\image_stitching.py", line 7, in from kornia.feature import LocalFeatureMatcher, LoFTR File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\init.py", line 7, in from .integrated import ( File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\integrated.py", line 17, in from .lightglue import LightGlue File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\lightglue.py", line 30, in from flash_attn.modules.mha import FlashCrossAttention File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\flash_attn\init__.py", line 3, in from flash_attn.flash_attn_interface import ( File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\flash_attn\flash_attn_interface.py", line 10, in import flash_attn_2_cuda as flash_attn_cuda ImportError: DLL load failed while importing flash_attn_2_cuda: 找不到指定的程序。

Cannot import D:\AI\ComfyUI\comfy_extras\nodes_canny.py module for custom nodes: DLL load failed while importing flash_attn_2_cuda: 找不到指定的程序。 Traceback (most recent call last): File "D:\AI\ComfyUI\nodes.py", line 1864, in load_custom_node module_spec.loader.exec_module(module) File "", line 940, in exec_module File "", line 241, in _call_with_frames_removed File "D:\AI\ComfyUI\comfy_extras\nodes_morphology.py", line 4, in from kornia.morphology import dilation, erosion, opening, closing, gradient, top_hat, bottom_hat File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia__init.py", line 8, in from . import augmentation, color, contrib, core, enhance, feature, io, losses, metrics, morphology, tracking, utils, x File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation__init.py", line 2, in from kornia.augmentation import auto, container File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto__init.py", line 1, in from .autoaugment import AutoAugment File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\autoaugment__init.py", line 1, in from .autoaugment import AutoAugment File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\autoaugment\autoaugment.py", line 5, in from kornia.augmentation.auto.base import SUBPOLICY_CONFIG, PolicyAugmentBase File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\base.py", line 6, in from kornia.augmentation.auto.operations.policy import PolicySequential File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\operations__init.py", line 3, in from .policy import PolicySequential File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\auto\operations\policy.py", line 7, in from kornia.augmentation.container.base import ImageSequentialBase, TransformMatrixMinIn File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\container__init__.py", line 1, in from kornia.augmentation.container.augment import AugmentationSequential File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation\container\augment.py", line 5, in from kornia.augmentation._3d.base import AugmentationBase3D, RigidAffineAugmentationBase3D File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_3d\init__.py", line 3, in from kornia.augmentation._3d.mix import * File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_3d\mix\init__.py", line 1, in from kornia.augmentation._3d.mix.transplantation import RandomTransplantation3D File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_3d\mix\transplantation.py", line 1, in from kornia.augmentation._2d.mix.transplantation import RandomTransplantation File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\init__.py", line 2, in from kornia.augmentation._2d.intensity import * File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\intensity\init__.py", line 28, in from kornia.augmentation._2d.intensity.plasma import RandomPlasmaBrightness, RandomPlasmaContrast, RandomPlasmaShadow File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\augmentation_2d\intensity\plasma.py", line 5, in from kornia.contrib import diamond_square File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\contrib\init.py", line 15, in from .image_stitching import ImageStitcher File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\contrib\image_stitching.py", line 7, in from kornia.feature import LocalFeatureMatcher, LoFTR File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\init.py", line 7, in from .integrated import ( File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\integrated.py", line 17, in from .lightglue import LightGlue File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\kornia\feature\lightglue.py", line 30, in from flash_attn.modules.mha import FlashCrossAttention File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\flash_attn\init__.py", line 3, in from flash_attn.flash_attn_interface import ( File "D:\AI\ComfyUI\venv-comfyui\Lib\site-packages\flash_attn\flash_attn_interface.py", line 10, in import flash_attn_2_cuda as flash_attn_cuda ImportError: DLL load failed while importing flash_attn_2_cuda: 找不到指定的程序。

Cannot import D:\AI\ComfyUI\comfy_extras\nodes_morphology.py module for custom nodes: DLL load failed while importing flash_attn_2_cuda: 找不到指定的程序。

Loading: ComfyUI-Manager (V2.17)

ComfyUI Revision: 2128 [258dbc06] | Released on '2024-04-14'

Import times for custom nodes: 0.0 seconds: D:\AI\ComfyUI\custom_nodes\websocket_image_save.py 0.3 seconds: D:\AI\ComfyUI\custom_nodes\ComfyUI-Manager

WARNING: some comfy_extras/ nodes did not import correctly. This may be because they are missing some dependencies.

IMPORT FAILED: nodes_canny.py IMPORT FAILED: nodes_morphology.py

This issue might be caused by new missing dependencies added the last time you updated ComfyUI. Please do a: pip install -r requirements.txt

Starting server

To see the GUI go to: http://127.0.0.1:8188 [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json

wibur0620 commented 2 months ago

To see the GUI go to: http://127.0.0.1:8188 [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/model-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/alter-list.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json got prompt model_type EPS Using pytorch attention in VAE Using pytorch attention in VAE clip missing: ['clip_l.logit_scale', 'clip_l.transformer.text_projection.weight'] Requested to load SDXLClipModel Loading 1 new model D:\AI\ComfyUI\comfy\ldm\modules\attention.py:345: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.) out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False) Requested to load SDXL Loading 1 new model 65%|█████████████████████████████████████████████████████▎ | 13/20 [00:07<00:04, 1.70it/s]

NeedsMoar commented 2 months ago

There's a bigger cluster in that flash_attn is in the same repo as flash_attn-2 but not built by default and a different codebase

Try downloading and installing this : https://github.com/NeedsMoar/flash-attention-2-builds/releases/download/v2.3.6-flash-attention/flash_attn-2.3.6-cp311-cp311-win_amd64.whl It's old but I have it installed and have never seen that message. If it breaks something even more just remove it. I'm pretty sure torch is able to use it if needed.

But I'm still filled with doubt as to why I'm getting this warning with a fresh install of comfyui

Oh I would be too, I'm just shooting in the dark here (and hopefully things can be narrowed down). I suspect kornia needs a specific flash-attention build (they call it but they don't have it in their requirements anywhere)... the reason that suddenly showed up is that you uninstalled xformers which is what it uses preferentially. I might not have been clear but you want both installed, flash attention is for things that require it as a fallback. Most of those have the decency to build the version they need or at least put it in their requirements files. :P I'd just reinstall xformers from pip install xformers --extra-index-url https://download.pytorch.org/whl/cu121 to make sure you get the build with flash attention cuda kernels again, and remove flash-attention or not depending on whether it's actively breaking or fixing anything.

Akira13641 commented 2 months ago

Versions of Comfy that were bundling / depending on Pytorch 2.1.2 did not have this issue, the problem is solely with how Pytorch versions above that are compiled on Windows. You can get the faster Pytorch attention going again on Windows / Nvidia by uninstalling Xformers (which is not even currently a direct dependency of Comfy at all) if it's present, and rolling back Torch to "2.1.2+cu121".

gaoming714 commented 2 weeks ago

Warning: 1Torch was not compiled with flash attention.

First of all, let me tell you a good news. Failure usually does not affect the program running, but it is slower.

This warning is caused by the fact that after torch=2.2 update, flash attention V2 needs to be started as the optimal mechanism, but it is not successfully started.

In this blog https://pytorch.org/blog/pytorch2-2/, it is written that pytorch 2.2 has major updates

scaled_dot_product_attention (SDPA) now supports FlashAttention-2, yielding around 2x speedups compared to previous versions.

Usually, the order of function calls is FlashAttention > Memory-Efficient Attention(xformers) > PyTorch C++ implementation(math)

(I don't understand why it is designed this way, and the meaning is completely unclear from the warning. I hope the official next version will improve it)

But the pits I want to solve are the following places:

  1. It is supported in pytroch and is the first choice. The logic is that this Warning will be issued as long as flashAttentionV2 fails. (Some people have tested and found that flashAttentionV2 has not improved much)

  2. flashAttentionV2 does not have a complete ecosystem. The current official version (official website https://github.com/Dao-AILab/flash-attention) only supports Linux, and for Windows users, they can only compile the code (it is very slow anyway, even if ninja is installed). You can refer to (https://github.com/bdashore3/flash-attention/releases) for downloading third-party packages.

  3. The hardware support is at least RTX 30 or above. FlashAttention only supports Ampere GPUs or newer. In other words, it can run on 3060.

  4. There is still a small possibility that the environment cuda version and the compiled cuda version are incompatible. The official version of torch is 12.1 (torch2.* +cu121).