OrionStarAI / Orion

Orion-14B is a family of models includes a 14B foundation LLM, and a series of models: a chat model, a long context model, a quantized model, a RAG fine-tuned model, and an Agent fine-tuned model. Orion-14B 系列模型包括一个具有140亿参数的多语言基座大模型以及一系列相关的衍生模型,包括对话模型,长文本模型,量化模型,RAG微调模型,Agent微调模型等。
Apache License 2.0
784 stars 57 forks source link

已经安装flash-attn,但是运行还是报错未安装 #11

Open ctrlcplusv opened 9 months ago

ctrlcplusv commented 9 months ago

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

yecphaha commented 9 months ago

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

安装flash_attn的问题 答:先安装对应版本的cuda-nvcc,https://anaconda.org/nvidia/cuda-nvcc 再安装flash_attn,https://github.com/Dao-AILab/flash-attention/releases/ pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu122torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

ctrlcplusv commented 9 months ago

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

安装flash_attn的问题 答:先安装对应版本的cuda-nvcc,https://anaconda.org/nvidia/cuda-nvcc 再安装flash_attn,https://github.com/Dao-AILab/flash-attention/releases/ pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu122torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

你好,已按照上述安装,但还是存在问题,描述如下: 配置: 4090*4 nvcc -V 12.1 flash-attn 2.3.3 torch 2.1.0 transformers 4.34.1 torchvision 0.16.0+cu121 运行cli_demo会提示错误: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 以及 RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP) 是cuda版本问题吗

yecphaha commented 9 months ago

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

安装flash_attn的问题 答:先安装对应版本的cuda-nvcc,https://anaconda.org/nvidia/cuda-nvcc 再安装flash_attn,https://github.com/Dao-AILab/flash-attention/releases/ pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.3/flash_attn-2.3.3+cu122torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl

你好,已按照上述安装,但还是存在问题,描述如下: 配置: 4090*4 nvcc -V 12.1 flash-attn 2.3.3 torch 2.1.0 transformers 4.34.1 torchvision 0.16.0+cu121 运行cli_demo会提示错误: RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. 以及 RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP) 是cuda版本问题吗

你的版本是nvcc -V==12.1,torch==2.1.0,需要正确的flash_attn版本, flash_attn-2.3.3+cu122torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl,这个flash_attn版本里nvcc -V==12.2,torch==2.1.0,python==3.8

grape-Wu commented 3 months ago

我的cuda版本是12.2,torch版本是2.2.0,装的flash_attn-2.3.6+cu122torch2.2cxx11abiTRUE-cp39-cp39-linux_x86_64.whl这个版本 我的python也是3.9的,为什么还是显示环境中找不到。。。