bd-iaas-us / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
4 stars 1 forks source link

[Bug]: I get an error when I try to build the Docker Image from nf5 branch (for full prequantized bnb models support) #18

Open mkhammoud opened 4 months ago

mkhammoud commented 4 months ago

Your current environment

PyTorch version: 2.1.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: True
CUDA runtime version: 12.5.82
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 555.85
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2300
DeviceID=CPU0
Family=198
L2CacheSize=11776
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2300
Name=12th Gen Intel(R) Core(TM) i7-12700H
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] ctransformers==0.2.27+cu121
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.26.4
[pip3] onnx==1.15.0
[pip3] onnxruntime==1.18.0
[pip3] open-clip-torch==2.24.0
[pip3] sentence-transformers==3.0.0
[pip3] torch==2.1.1+cu121
[pip3] torch-grammar==0.3.3
[pip3] torch-model-archiver==0.9.0
[pip3] torchaudio==2.1.1+cu121
[pip3] torchvision==0.16.1
[pip3] transformers==4.40.2
[conda] Could not collect
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
Could not collect

🐛 Describe the bug

I downloaded the nf5 branch files to my pc and I tried to run docker build -t custom_vllm:latest .

I get the this error. Can you help?

59.91 [0/2] Re-checking globbed directories... 60.48 [1/35] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o 211.0 [2/35] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 FAILED: CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/workspace/csrc -I/workspace/build/temp.linux-x86_64-3.10/_deps/cutlass-src/include -I/workspace/build/temp.linux-x86_64-3.10/_deps/cutlass-src/tools/util/include -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_89,code=[sm_89]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=8 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/cache_kernels.cu.o -MF CMakeFiles/_C.dir/csrc/cache_kernels.cu.o.d -x cu -c /workspace/csrc/cache_kernels.cu -o CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 Killed 211.0 Killed 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2" 211.0 } 211.0 ^ 211.0 211.0 Remark: The warnings can be suppressed with "-diag-suppress " 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat16, nv_bfloat16)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat16, A=__nv_bfloat16, B=nv_bfloat16]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat162, A=__nv_bfloat162, B=nv_bfloat162]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat162, __nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat16, __nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=nv_bfloat16]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu 211.0 211.0 Killed 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2" 211.0 } 211.0 ^ 211.0 211.0 Remark: The warnings can be suppressed with "-diag-suppress " 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat16, nv_bfloat16)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat16, A=__nv_bfloat16, B=nv_bfloat16]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat162, A=__nv_bfloat162, B=nv_bfloat162]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat162, __nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat16, __nv_bfloat162, nv_bfloat162)" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=nv_bfloat16]" 211.0 } 211.0 ^ 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu 211.0 211.0 Remark: The warnings can be suppressed with "-diag-suppress " 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu 211.0 211.0 Remark: The warnings can be suppressed with "-diag-suppress " 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu 211.0 211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" 211.0 } 211.0 ^ 211.0 detected during: 211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu 211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu 211.0 211.0 ninja: build stopped: subcommand failed. 211.4 Traceback (most recent call last): 211.4 File "/workspace/setup.py", line 421, in 211.4 setup( 211.4 File "/usr/lib/python3/dist-packages/setuptools/init.py", line 153, in setup 211.4 return distutils.core.setup(*attrs) 211.4 File "/usr/lib/python3.10/distutils/core.py", line 148, in setup 211.4 dist.run_commands() 211.4 File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands 211.4 self.run_command(cmd) 211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command 211.4 cmd_obj.run() 211.4 File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run 211.4 self.run_command('build') 211.4 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command 211.4 self.distribution.run_command(command) 211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command 211.4 cmd_obj.run() 211.4 File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run 211.4 self.run_command(cmd_name) 211.4 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command 211.4 self.distribution.run_command(command) 211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command 211.4 cmd_obj.run() 211.4 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run 211.4 _build_ext.run(self) 211.4 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run 211.4 self.build_extensions() 211.4 File "/workspace/setup.py", line 205, in build_extensions 211.4 subprocess.check_call(["cmake", build_args], cwd=self.build_temp) 211.4 File "/usr/lib/python3.10/subprocess.py", line 369, in check_call 211.4 raise CalledProcessError(retcode, cmd) 211.4 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=1', '--target=_moe_C', '--target=_C', '--target=_punica_C']' returned non-zero exit status 1.

Dockerfile:104

103 | ENV CCACHE_DIR=/root/.cache/ccache 104 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \ 105 | >>> --mount=type=cache,target=/root/.cache/pip \ 106 | >>> if [ "$USE_SCCACHE" != "1" ]; then \ 107 | >>> python3 setup.py bdist_wheel --dist-dir=dist; \ 108 | >>> fi 109 |

ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then python3 setup.py bdist_wheel --dist-dir=dist; fi" did not complete successfully: exit code: 1

View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/qk77ejyftifq9a9lsonevwmak

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!

XiaoningDing commented 2 weeks ago

@chenqianfzh, could you help take a look and respond? Thanks.