Closed soulteary closed 2 months ago
Hi, currently there are conflicts between different inference environments, we are working on a clearer way for usage.
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: you need flash_attn package version to be greater or equal than 2.1.0. Detected version 2.0.4. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Hi, you may need to install a higher version of flash_attn. Enter pip install flash_attn>=2.1.0 on the command line.
@LDLINGLINGLING @jsonwull The goal of this issue is to provide a reference environment for other students. This is not a feedback on the version of flash attn, and when the above version is completely locked, the program can run normally and quickly :D
Is there an existing issue ? / 是否已有相关的 issue ?
Describe the bug / 描述这个 bug
The model repo currently does not have a requirement.txt, so it is estimated that many user will have problems running it.
模型目前没有 requirement.txt ,所以估计不少同学运行会出问题。
I tried to run it in the latest Nvidia docker container environment and use the new version of xformers. The currently running normal environment is as follows, for the reference of other users: preview
我尝试在最新的 Nvidia 容器环境中运行,并使用新版本的 xformers,目前运行正常的环境如下,供其他同学参考:相关运行结果
To Reproduce / 如何复现
refs to readme.md, lol
Expected behavior / 期望的结果
No response
Screenshots / 截图
No response
Environment / 环境
Additional context / 其他信息
No response