The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Thank you for taking the time to review my question.
Before I proceed, I would like to mention that I am a beginner, and I would appreciate your consideration of this fact.
I am seeking assistance in resolving the following warnings to improve execution speed. While I am able to obtain results, I receive the warning messages listed below. From my research, I understand that these warnings can affect execution speed, but I have been unable to find a solution, hence my question.
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\backbones\hieradet.py:68: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
x = F.scaled_dot_product_attention(
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:723.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:495.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:725.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: CuDNN attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:727.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: The CuDNN backend needs to be enabled by setting the enviornment variableTORCH_CUDNN_SDPA_ENABLED=1 (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:497.)
out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p)
C:\Users\USER\anaconda3\envs\ddd\Lib\site-packages\torch\nn\modules\module.py:1562: UserWarning: Flash Attention kernel failed due to: No available kernel. Aborting execution.
Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed).
return forward_call(*args, **kwargs)
My execution environment is as follows:
Docker
PyTorch 2.4.0
CUDA 12.4
GPU: RTX 3070 (Memory: 8.0G)
The CUDA environment on the host machine is:
Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0
I would greatly appreciate any guidance on how to address these warnings. Thank you in advance for your help.
Thank you for taking the time to review my question.
Before I proceed, I would like to mention that I am a beginner, and I would appreciate your consideration of this fact.
I am seeking assistance in resolving the following warnings to improve execution speed. While I am able to obtain results, I receive the warning messages listed below. From my research, I understand that these warnings can affect execution speed, but I have been unable to find a solution, hence my question.
C:\Users\USER\ddd\segment-anything-2\sam2\modeling\backbones\hieradet.py:68: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention( C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:723.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:495.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:725.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: CuDNN attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:727.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: The CuDNN backend needs to be enabled by setting the enviornment variable
TORCH_CUDNN_SDPA_ENABLED=1
(Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:497.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\anaconda3\envs\ddd\Lib\site-packages\torch\nn\modules\module.py:1562: UserWarning: Flash Attention kernel failed due to: No available kernel. Aborting execution. Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed). return forward_call(*args, **kwargs)My execution environment is as follows:
The CUDA environment on the host machine is: Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0
I would greatly appreciate any guidance on how to address these warnings. Thank you in advance for your help.