Closed Zhiy-Zhang closed 4 months ago
Thanks for your issue, the problem has been solved in latest commit. Wish you good luck
Thanks for your issue, the problem has been solved in latest commit. Wish you good luck
thanks for your reply, but there is another problem when i compiling on 2080Ti(compute capability: 75). building command: ./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'75'" -DPPLCOMMON_CUDA_ARCHITECTURES="'75'"
compiled error:
We only support gpu architecture >= sm_80, due to flash attention2's requirements. Maybe you could change to A100 GPU and try again
We only support gpu architecture >= sm_80, due to flash attention2's requirements. Maybe you could change to A100 GPU and try again
thanks for your advace, maybe i do not have to compile flash attentionv2. Can you tell me how can i change the compile file to solve this problem.
What are the problems?(screenshots or detailed error messages)
compile error:
What are the types of GPU/CPU you are using?
GPU:A100-80G-SXM4
What's the operating system ppl.llm.serving runs on?
Ubuntu 20.04.4 cuda:12.3 cudnn:8904 trt:9.2.0
What's the compiler and its version?
gcc 11.4 cmake version 3.27.9 Cuda compilation tools, release 12.3, V12.3.107
Which version(commit id or tag) of ppl.llm.serving is used?
commit id:c2bf8614ea7bce0cb9838255fb3cd6ab9d75039b
What are the commands used to build ppl.llm.serving?
./build.sh -DPPLNN_USE_LLM_CUDA=ON -DPPLNN_CUDA_ENABLE_NCCL=ON -DPPLNN_ENABLE_CUDA_JIT=OFF -DPPLNN_CUDA_ARCHITECTURES="'80;86;87'" -DPPLCOMMON_CUDA_ARCHITECTURES="'80;86;87'"
What are the execution commands?
None
minimal code snippets for reproducing these problems(if necessary)
None
models and inputs for reproducing these problems (send them to openppl.ai@hotmail.com if necessary)
None