LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).
Apache License 2.0
335 stars 35 forks source link

编译SwiftTransformer失败 #37

Open FredHuang99 opened 2 months ago

FredHuang99 commented 2 months ago

执行命令: git clone https://github.com/LLMServe/SwiftTransformer.git cd SwiftTransformergit;submodule update --init --recursive;cmake -B build;cmake --build build -j$(nproc)

报错原因: (截取部分具有代表性的错误) /workspace/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:45: error: call of overloaded ‘fabs(half)’ is ambiguous 93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i])); | ~~^~~~~~ /workspace/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:75: error: call of overloaded ‘fabs(half)’ is ambiguous 93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i])); | ~~^~~~~~ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(145): error: name followed by "::" must be a class or namespace name wmma::fragment<wmma::matrix_a, 16ul, 16ul, 16ul, half, wmma::row_major> a_frag; ^ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(146): error: type name is not allowed wmma::fragment<wmma::matrix_b, 16ul, 16ul, 16ul, __half, wmma::col_major> b_frag; ^ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(146): error: identifier "b_frag" is undefined wmma::fragment<wmma::matrix_b, 16ul, 16ul, 16ul, half, wmma::col_major> b_frag; ^ 编译环境 nvcr.io/nvidia/pytorch/23.10-py3镜像 CXX compiler: GNU 11.4.0 CUDA: NVIDIA 12.2.140 CUDAToolkit: 12.2.140 NCCL: libnccl.so.2.19.3 MPI: 3.1

duihuhu commented 2 months ago

is it version problem?

interestingLSY commented 2 months ago

Have you added the --gpus=all argument when you launch the docker container, or equivalently, can you see your GPUs when you type nvidia-smi inside your docker container?

FredHuang99 commented 2 months ago

Have you added the --gpus=all argument when you launch the docker container, or equivalently, can you see your GPUs when you type nvidia-smi inside your docker container?

i have added --gpus all

William12github commented 2 months ago

what's the GPU type in the system?

TZHelloWorld commented 2 months ago

maybe you can execute submodule update --init --recursive; to make sure that submodule installs all the sub-modules.