Open eanzero opened 1 month ago
I am getting the same warnings.
My environment :
Same for me on
Hi @eanzero @dario-spagnolo @renhaa, you can turn off this warning by changing the line https://github.com/facebookresearch/sam2/blob/52198ead0eb13ae8270bea6ca768ef175f5bf167/sam2/modeling/sam/transformer.py#L23 to
OLD_GPU, USE_FLASH_ATTN, MATH_KERNEL_ON = True, True, True
This would directly try out all the available kernels (instead of trying Flash Attention first and then falling back to other kernels upon errors).
@eanzero The error message above shows that the Flash Attention kernel failed
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:725.)
but PyTorch didn't print a further line explaining why it failed. Meanwhile, the GPU you're using (RTX 3070) has a CUDA compute capability of 8.6 according to https://developer.nvidia.com/cuda-gpus, so it should support Flash Attention in principle.
A possible cause is that there could be some mismatch between your CUDA driver, CUDA runtime, and PyTorch versions, causing Flash Attention kernels to fail, especially given that you're using Windows. Previously people have reported issues with Flash Attention on Windows (e.g. in https://github.com/pytorch/pytorch/issues/108175 and https://github.com/Dao-AILab/flash-attention/issues/553), and it could be the same issue in your case. To avoid these issues, it's recommended to use Windows Subsystem for Linux if you're running on Windows.
I met the same problem. My env:
In my test, the flash attention is working. But it can't work in sam2. The whole message is as follows:
sam2/sam2/modeling/sam/transformer.py:269: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:723.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
sam2/sam2/modeling/sam/transformer.py:269: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at ../aten/src/ATen/native/transformers/sdp_utils_cpp.h:495.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
sam2/sam2/modeling/sam/transformer.py:269: UserWarning: Flash attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:725.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
**sam2/sam2/modeling/sam/transformer.py:269: UserWarning: Expected query, key and value to all be of dtype: {Half, BFloat16}. Got Query dtype: float, Key dtype: float, and Value dtype: float instead. (Triggered internally at ../aten/src/ATen/native/transformers/sdp_utils_cpp.h:98.)**
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
sam2/sam2/modeling/sam/transformer.py:269: UserWarning: CuDNN attention kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:727.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
anaconda3/envs/env_sam/lib/python3.10/site-packages/torch/nn/modules/module.py:1562: UserWarning: Flash Attention kernel failed due to: No available kernel. Aborting execution.
Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed).
Luckily, there is a PR to solve this problem.
https://github.com/facebookresearch/sam2/pull/322
It works for me.
I had this warning: sam2/modeling/sam/transformer.py:270: UserWarning: Memory efficient kernel not used because: (Triggered internally at ../aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:773.)
The above mentioned pr (https://github.com/facebookresearch/sam2/pull/322) fixed that issue for me.
Thank you for taking the time to review my question.
Before I proceed, I would like to mention that I am a beginner, and I would appreciate your consideration of this fact.
I am seeking assistance in resolving the following warnings to improve execution speed. While I am able to obtain results, I receive the warning messages listed below. From my research, I understand that these warnings can affect execution speed, but I have been unable to find a solution, hence my question.
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\backbones\hieradet.py:68: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention( C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:723.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:495.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:725.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: CuDNN attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:727.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: The CuDNN backend needs to be enabled by setting the enviornment variable
TORCH_CUDNN_SDPA_ENABLED=1
(Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:497.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\anaconda3\envs\ddd\Lib\site-packages\torch\nn\modules\module.py:1562: UserWarning: Flash Attention kernel failed due to: No available kernel. Aborting execution. Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed). return forward_call(*args, **kwargs)My execution environment is as follows:
The CUDA environment on the host machine is: Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0
I would greatly appreciate any guidance on how to address these warnings. Thank you in advance for your help.