Closed ThreeBoxWithCode closed 11 months ago
Hi, @ThreeBoxWithCode, we use None
fallback to handle the import error of FlashAttention2.
Please check if your environment satisfies flash-attn>=2.3.3 for Ampere, Ada, or Hopper GPUs.
Thank you for your reply. I installed flash-attn by "pip install flash-attn --no-build-isolation" before. Now, I reinstall the flash-attn wheel, which version is 2.3.5 with cuda11.7, torch1.13.1 but it still cannot work Should I build the cilp before running the inference.ipynb?
And I test to import the flash-attn, It can work:
Python 3.8.17 (default, Jul 5 2023, 21:04:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from flash_attn.flash_attn_interface import (
... flash_attn_func,
... flash_attn_kvpacked_func,
... flash_attn_qkvpacked_func,
... flash_attn_varlen_func,
... flash_attn_varlen_kvpacked_func,
... flash_attn_varlen_qkvpacked_func,
... flash_attn_with_kvcache,
... )
>>>
Test your environment with the following imports:
from flash_attn import flash_attn_func
from flash_attn import flash_attn_with_kvcache
from flash_attn.layers.rotary import apply_rotary_emb
Reference: https://github.com/baaivision/tokenize-anything/blob/main/tokenize_anything/modeling/text_decoder.py
They all work.
If it works, the apply_rotary_emb
should not be None
, restart your Jupyter kernel
Yes.
When I restarted the Jupyter kernel, it worked.
It seems that installing flash-attn with pip install flash-attn --no-build-isolation
does not work
Thank you for answering my questions late at night.
Pip works. But VSCode Jupyter Extension will not reload python runtime until next restarting 😅.
Wonderful work! When I configured the environment and gradually debugged the inference.ipynb, I got the error on Visual Prompt Decoding: Point: TypeError: 'NoneType' object is not callable. The detailed error information is as follows: