Open anaivebird opened 2 days ago
it raised an error
[ 0%] Generating .check_symbol
[ 0%] Generating .check_symbol_executor
[ 0%] Generating .check_symbol_internal_cutlass_kernels
[ 0%] Built target gemm_swiglu_sm90_src
[ 0%] Built target fb_gemm_src
[ 0%] Built target check_symbol
[ 0%] Built target check_symbol_executor
[ 0%] Built target check_symbol_internal_cutlass_kernels
[ 0%] Built target cutlass_src
[ 1%] Built target selective_scan_src
[ 2%] Built target common_src
[ 2%] Built target layers_src
[ 3%] Built target moe_gemm_src
[ 4%] Built target fpA_intB_gemm_src
[ 4%] Building CXX object tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o
[ 5%] Built target decoder_attention
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In function ‘void {anonymous}::setWeightStreaming(nvinfer1::ICudaEngine&, float)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:113:16: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘setWeightStreamingBudgetV2’; did you mean ‘setWeightStreamingBudget’?
113 | engine.setWeightStreamingBudgetV2(budget);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
| setWeightStreamingBudget
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In constructor ‘tensorrt_llm::runtime::TllmRuntime::TllmRuntime(const tensorrt_llm::runtime::RawEngine&, nvinfer1::ILogger*, float, bool)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:242:41: error: ‘class nvinfer1::ICudaEngine’ has no member named ‘getDeviceMemorySizeV2’; did you mean ‘getDeviceMemorySize’?
242 | auto const devMemorySize = mEngine->getDeviceMemorySizeV2();
| ^~~~~~~~~~~~~~~~~~~~~
| getDeviceMemorySize
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp: In member function ‘nvinfer1::IExecutionContext& tensorrt_llm::runtime::TllmRuntime::addContext(int32_t)’:
/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:284:13: error: ‘class nvinfer1::IExecutionContext’ has no member named ‘setDeviceMemoryV2’; did you mean ‘setDeviceMemory’?
284 | context.setDeviceMemoryV2(mEngineBuffer->data(), static_cast<int64_t>(mEngineBuffer->getCapacity()));
| ^~~~~~~~~~~~~~~~~
| setDeviceMemory
gmake[3]: *** [tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/build.make:527: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/tllmRuntime.cpp.o] Error 1
gmake[2]: *** [CMakeFiles/Makefile2:1935: tensorrt_llm/runtime/CMakeFiles/runtime_src.dir/all] Error 2
gmake[2]: *** Waiting for unfinished jobs....
[ 23%] Built target decoder_attention_src
[ 63%] Built target kernels_src
[ 98%] Built target context_attention_src
gmake[1]: *** [CMakeFiles/Makefile2:1537: tensorrt_llm/CMakeFiles/tensorrt_llm.dir/rule] Error 2
gmake: *** [Makefile:218: tensorrt_llm] Error 2
Traceback (most recent call last):
File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 434, in <module>
main(**vars(args))
File "/home/work/xingwuFileSystem/qserve_trtllm/TensorRT-LLM/scripts/build_wheel.py", line 208, in main
build_run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 192 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings executorWorker ' returned non-zero exit status 2.
Thank you to report the issue. It is a bug at https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/kernels/CMakeLists.txt#L25. We should replace SRC_CU
by SRC_CPP
. We will fix it ASAP.
change this will fix which bug? compiling is slow, or the error?
It fixes the slow compiling.
For the error, please create another bug if it is not related to the issue above.
System Info
Who can help?
@byshiue @Superjomn
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
build in less than one hour
actual behavior
build slow more than 2 hours
additional notes
lots of process when compiling, with ptxas -arch sm_80 which is unrelated to sm90 even when I use --cuda_architectures 90
ps.txt