Open FredHuang99 opened 2 months ago
is it version problem?
Have you added the --gpus=all
argument when you launch the docker container, or equivalently, can you see your GPUs when you type nvidia-smi
inside your docker container?
Have you added the
--gpus=all
argument when you launch the docker container, or equivalently, can you see your GPUs when you typenvidia-smi
inside your docker container?
i have added --gpus all
what's the GPU type in the system?
maybe you can execute submodule update --init --recursive;
to make sure that submodule installs all the sub-modules.
执行命令: git clone https://github.com/LLMServe/SwiftTransformer.git cd SwiftTransformergit;submodule update --init --recursive;cmake -B build;cmake --build build -j$(nproc)
报错原因: (截取部分具有代表性的错误) /workspace/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:45: error: call of overloaded ‘fabs(half)’ is ambiguous 93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i])); |
~~^~~~~~ /workspace/DistServe/SwiftTransformer/src/unittest/util/../unittest_utils.h:93:75: error: call of overloaded ‘fabs(half)’ is ambiguous 93 | fabs(answer[i]-reference[i]), fabs(answer[i]-reference[i])/fabs(reference[i])); |~~^~~~~~ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(145): error: name followed by "::" must be a class or namespace name wmma::fragment<wmma::matrix_a, 16ul, 16ul, 16ul, half, wmma::row_major> a_frag; ^ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(146): error: type name is not allowed wmma::fragment<wmma::matrix_b, 16ul, 16ul, 16ul, __half, wmma::col_major> b_frag; ^ /workspace/DistServe/SwiftTransformer/src/csrc/kernel/fused_context_stage_attention.cu(146): error: identifier "b_frag" is undefined wmma::fragment<wmma::matrix_b, 16ul, 16ul, 16ul, half, wmma::col_major> b_frag; ^ 编译环境 nvcr.io/nvidia/pytorch/23.10-py3镜像 CXX compiler: GNU 11.4.0 CUDA: NVIDIA 12.2.140 CUDAToolkit: 12.2.140 NCCL: libnccl.so.2.19.3 MPI: 3.1