Open mkhammoud opened 4 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
@chenqianfzh, could you help take a look and respond? Thanks.
Your current environment
🐛 Describe the bug
I downloaded the nf5 branch files to my pc and I tried to run docker build -t custom_vllm:latest .
I get the this error. Can you help?
59.91 [0/2] Re-checking globbed directories... 60.48 [1/35] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/torch_bindings.cpp.o 211.0 [2/35] Building CUDA object CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 FAILED: CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DPy_LIMITED_API=3 -DTORCH_EXTENSION_NAME=_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_C_EXPORTS -I/workspace/csrc -I/workspace/build/temp.linux-x86_64-3.10/_deps/cutlass-src/include -I/workspace/build/temp.linux-x86_64-3.10/_deps/cutlass-src/tools/util/include -isystem /usr/include/python3.10 -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/cuda/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[sm_70]" "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_89,code=[sm_89]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_90,code=[compute_90]" -Xcompiler=-fPIC --expt-relaxed-constexpr -DENABLE_FP8 --threads=8 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_C.dir/csrc/cache_kernels.cu.o -MF CMakeFiles/_C.dir/csrc/cache_kernels.cu.o.d -x cu -c /workspace/csrc/cache_kernels.cu -o CMakeFiles/_C.dir/csrc/cache_kernels.cu.o 211.0 Killed 211.0 Killed 211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2" 211.0 } 211.0 ^ 211.0 211.0 Remark: The warnings can be suppressed with "-diag-suppress"
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat16, nv_bfloat16)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat16, A=__nv_bfloat16, B=nv_bfloat16]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat162, A=__nv_bfloat162, B=nv_bfloat162]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat162, __nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat16, __nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=nv_bfloat16]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu
211.0
211.0 Killed
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(97): warning #940-D: missing return statement at end of non-void function "vllm::bf1622float2"
211.0 }
211.0 ^
211.0
211.0 Remark: The warnings can be suppressed with "-diag-suppress "
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(105): warning #940-D: missing return statement at end of non-void function "vllm::bf162bf162"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(118): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat16, nv_bfloat16)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(126): warning #940-D: missing return statement at end of non-void function "vllm::add(nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(173): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat16, A=__nv_bfloat16, B=nv_bfloat16]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(182): warning #940-D: missing return statement at end of non-void function "vllm::mul<Acc,A,B>(A, B) [with Acc=nv_bfloat162, A=__nv_bfloat162, B=nv_bfloat162]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(292): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat162, __nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/../../../attention/dtype_bfloat16.cuh(301): warning #940-D: missing return statement at end of non-void function "vllm::fma(nv_bfloat16, __nv_bfloat162, nv_bfloat162)"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(478): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_vec_conversion<Tout,Tin>(const Tin &, float, nv_fp8_interpretation_t) [with Tout=uint8_t, Tin=nv_bfloat16]"
211.0 }
211.0 ^
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu
211.0
211.0 Remark: The warnings can be suppressed with "-diag-suppress "
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=float, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 358 of /workspace/csrc/cache_kernels.cu
211.0
211.0 Remark: The warnings can be suppressed with "-diag-suppress "
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=uint16_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 360 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint8_t, Tin=nv_bfloat16, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 362 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=float, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 364 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=uint16_t, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 366 of /workspace/csrc/cache_kernels.cu
211.0
211.0 /workspace/csrc/quantization/fp8/nvidia/quant_utils.cuh(523): warning #940-D: missing return statement at end of non-void function "vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]"
211.0 }
211.0 ^
211.0 detected during:
211.0 instantiation of "Tout vllm::fp8::scaled_convert<Tout,Tin,kv_dt>(const Tin &, float) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 327 of /workspace/csrc/cache_kernels.cu
211.0 instantiation of "void vllm::convert_fp8_kernel<Tout,Tin,kv_dt>(const Tin , Tout , float, int64_t) [with Tout=nv_bfloat16, Tin=uint8_t, kv_dt=vllm::Fp8KVCacheDataType::kAuto]" at line 368 of /workspace/csrc/cache_kernels.cu
211.0
211.0 ninja: build stopped: subcommand failed.
211.4 Traceback (most recent call last):
211.4 File "/workspace/setup.py", line 421, in
211.4 setup(
211.4 File "/usr/lib/python3/dist-packages/setuptools/init.py", line 153, in setup
211.4 return distutils.core.setup(*attrs)
211.4 File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
211.4 dist.run_commands()
211.4 File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
211.4 self.run_command(cmd)
211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
211.4 cmd_obj.run()
211.4 File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
211.4 self.run_command('build')
211.4 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
211.4 self.distribution.run_command(command)
211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
211.4 cmd_obj.run()
211.4 File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
211.4 self.run_command(cmd_name)
211.4 File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
211.4 self.distribution.run_command(command)
211.4 File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
211.4 cmd_obj.run()
211.4 File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
211.4 _build_ext.run(self)
211.4 File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
211.4 self.build_extensions()
211.4 File "/workspace/setup.py", line 205, in build_extensions
211.4 subprocess.check_call(["cmake", build_args], cwd=self.build_temp)
211.4 File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
211.4 raise CalledProcessError(retcode, cmd)
211.4 subprocess.CalledProcessError: Command '['cmake', '--build', '.', '-j=1', '--target=_moe_C', '--target=_C', '--target=_punica_C']' returned non-zero exit status 1.
Dockerfile:104
103 | ENV CCACHE_DIR=/root/.cache/ccache 104 | >>> RUN --mount=type=cache,target=/root/.cache/ccache \ 105 | >>> --mount=type=cache,target=/root/.cache/pip \ 106 | >>> if [ "$USE_SCCACHE" != "1" ]; then \ 107 | >>> python3 setup.py bdist_wheel --dist-dir=dist; \ 108 | >>> fi 109 |
ERROR: failed to solve: process "/bin/sh -c if [ \"$USE_SCCACHE\" != \"1\" ]; then python3 setup.py bdist_wheel --dist-dir=dist; fi" did not complete successfully: exit code: 1
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/qk77ejyftifq9a9lsonevwmak