Seeking Guidance: Addressing Performance-Related Warning Messages to Optimize Execution Speed

Thank you for taking the time to review my question.

Before I proceed, I would like to mention that I am a beginner, and I would appreciate your consideration of this fact.

I am seeking assistance in resolving the following warnings to improve execution speed. While I am able to obtain results, I receive the warning messages listed below. From my research, I understand that these warnings can affect execution speed, but I have been unable to find a solution, hence my question.

C:\Users\USER\ddd\segment-anything-2\sam2\modeling\backbones\hieradet.py:68: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.) x = F.scaled_dot_product_attention( C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory efficient kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:723.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Memory Efficient attention has been runtime disabled. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen/native/transformers/sdp_utils_cpp.h:495.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: Flash attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:725.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: CuDNN attention kernel not used because: (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:727.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\ddd\segment-anything-2\sam2\modeling\sam\transformer.py:270: UserWarning: The CuDNN backend needs to be enabled by setting the enviornment variableTORCH_CUDNN_SDPA_ENABLED=1 (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:497.) out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) C:\Users\USER\anaconda3\envs\ddd\Lib\site-packages\torch\nn\modules\module.py:1562: UserWarning: Flash Attention kernel failed due to: No available kernel. Aborting execution. Falling back to all available kernels for scaled_dot_product_attention (which may have a slower speed). return forward_call(*args, **kwargs)

My execution environment is as follows:

Docker
PyTorch 2.4.0
CUDA 12.4
GPU: RTX 3070 (Memory: 8.0G)

The CUDA environment on the host machine is: Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0

I would greatly appreciate any guidance on how to address these warnings. Thank you in advance for your help.

facebookresearch / segment-anything-2

Seeking Guidance: Addressing Performance-Related Warning Messages to Optimize Execution Speed #329