Closed Sonicwanjie closed 10 months ago
Thanks you. This link is work. bug how to debug 0.71?
if you want to debug C++, you need to build with type=Debug. I thank Linux and Docker may work well!
if you only want upgrade to to 0.7.1, just install tensorrt_llm==0.7.1.
Linux in python 3.12 pass I try 0.7.1 on win11, but batch_manager not work. Move 0.7.0 batch_manager to 0.7.1,fail.
gptManager.obj : error LNK2001: 无法解析的外部符号 "public: enum tensorrt_llm::batch_manager::BatchManagerErrorCode_t cdecl tensorrt_llm::batch_manager::Gpt
Manager::shutdown(void)" (?shutdown@GptManager@batch_manager@tensorrt_llm@@QEAA?AW4BatchManagerErrorCode_t@23@XZ) [C:\GPT\TensorRT-LLM-0.7.1\cpp\buil
d\tensorrt_llm\pybind\bindings.vcxproj]
gptManager.obj : error LNK2001: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::GptManager::GptManager(class std::filesystem::path const &,en
um tensorrt_llm::batch_manager::TrtGptModelType,int,enum tensorrt_llm::batch_manager::batch_scheduler::SchedulerPolicy,class std::function<class std:
:list<class std::shared_ptr
_manager::InferenceRequest> > > cdecl(int)>,class std::function<void cdecl(unsigned int64,class std::list<class tensorrt_llm::batch_manager::Na
medTensor,class std::allocator
ar>,class std::allocator
d::equal_to<unsigned
truct std::char_traits
ptional
12@HW4SchedulerPolicy@batch_scheduler@12@V?$function@$$A6A?AV?$list@V?$shared_ptr@VInferenceRequest@batch_manager@tensorrt_llm@@@std@@V?$allocator@V?
$shared_ptr@VInferenceRequest@batch_manager@tensorrt_llm@@@std@@@2@@std@@H@Z@5@V?$function@$$A6AX_KAEBV?$list@VNamedTensor@batch_manager@tensorrt_llm
@@V?$allocator@VNamedTensor@batch_manager@tensorrt_llm@@@std@@@std@@_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z@5@V?$function@
$$A6A?AV?$unordered_set@_KU?$hash@_K@std@@U?$equal_to@_K@2@V?$allocator@_K@2@@std@@XZ@5@V?$function@$$A6AXAEBV?$basic_string@DU?$char_traits@D@std@@V
?$allocator@D@2@@std@@@Z@5@AEBVTrtGptModelOptionalParams@12@V?$optional@_K@5@V?$optional@H@5@_N@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\
pybind\bindings.vcxproj]
tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KV
CacheManager(int,int,int,int,int,int,int,int,int,int,enum nvinfer1::DataType,class std::shared_ptr
?0KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAA@HHHHHHHHHHW4DataType@nvinfer1@@V?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@st
d@@_N@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj]
tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: void
er@batch_manager@tensorrt_llm@@QEAAXHHHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensor
rt_llm\pybind\bindings.vcxproj]
tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManage
r::removeSequence(int,class std::shared_ptr
@batch_manager@tensorrt_llm@@QEAAXHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_l
lm\pybind\bindings.vcxproj]
tensorrt_llm_batch_manager_static.lib(GptManager.cpp.obj) : error LNK2001: 无法解析的外部符号 "public: static class tensorrt_llm::runtime::WorldConfig
tensorrt_llm@@SA?AV123@AEAVILogger@nvinfer1@@HV?$optional@H@std@@1@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj]
C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\Release\bindings.cp310-win_amd64.pyd : fatal error LNK1120: 6 个无法解析的外部命令 [C:\GPT\TensorRT-LLM
-0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj]
Hi @Sonicwanjie. Thanks for reporting the issue.
1) We haven't released the official tensorrt_llm==0.7.1
and corresponding batch_manager
lib file yet, so please be on the lookout for that. I will update the thread once it is released :)
2) Can you share the build commands you use in the original error log? From the signatures, I think the linking error is because the libs are being built from main
and the rel
branch batch_manager is being utilized, leading to these link issues.
In any case, let me ping you after (1) is completed, and then we can investigate if the issue persists. Thanks for your patience!
@tp5uiuc Hi I finished compiling 0.7.1, but ended up running the model. It reports an error
RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T) nullptr, idVals, (int) nullptr, vocabSize batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) 8, stream): no kernel image is available for execution on the device (C:\GPT\TensorRT-LLM-0.7.1\cpp\tensorrt_llm\kernels\samplingTopPKernels.cu:326)
I modified build_wheel.py to cmake_generator = "-A X64 -T host=x64". python ./scripts/build_wheel.py --cuda_architectures "90-real" --trt_root "C:/Users/Sonic/inference/TensorRT"
python ./scripts/build_wheel.py --cuda_architectures "89-real;90-real" --trt_root "C:/Users/Sonic/inference/TensorRT" cmake_generator = "-A x64"
On win11 0.71 now works.
If I choice option BUILD TESTS, yet fail.
1>Auto build dll exports
1> 正在创建库 C:/GPT/TensorRT-LLM/cpp/build/tensorrt_llm/Release/tensorrt_llm.lib 和对象 C:/GPT/TensorRT-LLM/cpp/build/tensorrt_llm/Release/tensorrt_llm.exp
1>LINK : warning LNK4098: 默认库“LIBCMT”与其他库的使用冲突;请使用 /NODEFAULTLIB:library
1>gptSession.obj : error LNK2019: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KVCacheManager(int,int,int,int,int,int,int,int,int,int,int,bool,enum nvinfer1::DataType,class std::shared_ptr
Thanks @Sonicwanjie.
Two updates
1) We have released wheel 0.7.1
for tensorrt-llm
on windows, so please feel free to use that.
2) Can you comment on my original question?
From the signatures, I think the linking error is because the code/libs are being built from main and the rel branch batch_manager is being utilized, leading to these link issues.
From the latest error logs as well, I think you are building tensorrt-llm
from the main
branch. Can you instead build from rel
branch?
git clone --branch rel https://github.com/NVIDIA/TensorRT-LLM.git
and run the build commands from there? The tensorrt_llm_batch_manager_static.lib
is only intended to work with the release branch (main
branch has different signatures and so you will get linker errors).
@tp5uiuc Thanks,
适用于 .NET Framework MSBuild 版本 17.8.3+195e7f5a3
1>Checking Build System Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/runtime/CMakeLists.txt common_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\common\common_src.dir\Release\common_src.lib runtime_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\runtime\runtime_src.dir\Release\runtime_src.lib layers_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\layers\layers_src.dir\Release\layers_src.lib kernels_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\kernels\kernels_src.dir\Release\kernels_src.lib Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/CMakeLists.txt tensorrt_llm_static.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\Release\tensorrt_llm_static.lib Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/CMakeLists.txt Auto build dll exports tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2005: "public: static class tensorrt_llm::com mon::Logger __cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经 在 logger.obj 中定义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(microBatchScheduler.obj) : error LNK2005: "public: static class tensorrt_llm::common::Log ger cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经在 logger.o bj 中定义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(batchScheduler.obj) : error LNK2005: "public: static class tensorrt_llm::common::Logger * cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经在 logger.obj 中定 义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] 正在创建库 C:/GPT/TensorRT-LLM-0.7.1/cpp/build/tensorrt_llm/Release/tensorrt_llm.lib 和对象 C:/GPT/TensorRT-LLM-0.7.1/cpp/build/ten sorrt_llm/Release/tensorrt_llm.exp LINK : warning LNK4098: 默认库“LIBCMT”与其他库的使用冲突;请使用 /NODEFAULTLIB:library [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensor rt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KVCac heManager(int,int,int,int,int,int,int,int,int,int,enum nvinfer1::DataType,class std::shared_ptr<class tensorrt_llm::runtime::Cu daStream>,bool)" (??0KVCacheManager@kv_cache_manager@batch_manager@tensorrtllm@@QEAA@HHHHHHHHHHW4DataType@nvinfer1@@V?$shared ptr@VCudaStream@runtime@tensorrt_llm@@@std@@_N@Z),函数 "private: void cdecl tensorrt_llm::runtime::GptSession::createKvCacheMan ager(int,int,int,int,class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &)" (?createKvCacheManager@GptSes sion@runtime@tensorrt_llm@@AEAAXHHHHAEBVKvCacheConfig@kv_cache_manager@batch_manager@3@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\c pp\build\tensorrt_llm\tensorrt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager:: addSequence(int,int,int,class std::shared_ptr const &)" (?addSequence@KVCacheMan
ager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAAXHHHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z),函数
"private: void cdecl tensorrt_llm::runtime::GptSession::kvCacheAddSequences(int,int,int)" (?kvCacheAddSequences@GptSession@r
untime@tensorrt_llm@@AEAAXHHH@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj]
gptSession.obj : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::
removeSequence(int,class std::shared_ptr const &)" (?removeSequence@KVCacheManag
er@kv_cache_manager@batch_manager@tensorrt_llm@@QEAAXHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z),函数 "pr
ivate: void cdecl tensorrt_llm::runtime::GptSession::generateBatched(class std::vector<class tensorrt_llm::runtime::Generatio
nOutput,class std::allocator > &,class std::vector<class tensorrt_llm::runtime::
GenerationInput,class std::allocator > const &,class tensorrt_llm::runtime::Sampl
ingConfig const &,class std::function<void cdecl(int,bool)> const &)" (?generateBatched@GptSession@runtime@tensorrt_llm@@AEAA
XAEAV?$vector@VGenerationOutput@runtime@tensorrt_llm@@V?$allocator@VGenerationOutput@runtime@tensorrt_llm@@@std@@@std@@AEBV?$ve
ctor@VGenerationInput@runtime@tensorrt_llm@@V?$allocator@VGenerationInput@runtime@tensorrt_llm@@@std@@@5@AEBVSamplingConfig@23@
AEBV?$function@$$A6AXH_N@Z@5@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj]
gptSession.obj : error LNK2019: 无法解析的外部符号 "public: static int cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheMan
ager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class const &,int,class tensorrt_llm:
:runtime::BufferManager &,class tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const *,int,class tensorrt_llm::r
untime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &)" (?prepareContextStep@RuntimeBuffers@runtime@t
ensorrt_llm@@QEAAXAEBV?$shared_ptr@VITensor@runtime@tensorrt_llm@@@std@@HAEAVBufferManager@23@PEBVKVCacheManager@kv_cache_manag
er@batch_manager@3@HAEBVGptModelConfig@23@AEBVWorldConfig@23@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tens
orrt_llm.vcxproj]
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: static class std::sh
ared_ptr cdecl tensorrt_llm::runtime::NcclCommunicator::createPipelineComm(cla
ss tensorrt_llm::runtime::WorldConfig const &,class nvinfer1::ILogger &)" (?createPipelineComm@NcclCommunicator@runtime@tensorr
t_llm@@SA?AV?$shared_ptr@VNcclCommunicator@runtime@tensorrt_llm@@@std@@AEBVWorldConfig@23@AEAVILogger@nvinfer1@@@Z),函数 "public:
__cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TrtGptModelInflightBatching(int,class std::shared_ptr<class > const &,bool,enum tensorrt_llm::batch_manager::batch_scheduler::
SchedulerPolicy,class tensorrt_llm::batch_manager::TrtGptModelOptionalParams const &)" (??0TrtGptModelInflightBatching@batch_ma
nager@tensorrt_llm@@QEAA@HV?$shared_ptr@VILogger@nvinfer1@@@std@@AEBVGptModelConfig@runtime@2@AEBVWorldConfig@62@AEBV?$vector@E
V?$allocator@E@std@@@4@_NW4SchedulerPolicy@batch_scheduler@12@AEBVTrtGptModelOptionalParams@12@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM
-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj]
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "void cdecl tensorrt_llm::r
untime::kernels::invokeCopyBatch(class tensorrt_llm::runtime::IBuffer const &,class tensorrt_llm::runtime::IBuffer &,class
tensorrt_llm::runtime::IBuffer const &,class tensorrt_llm::runtime::IBuffer const &,unsigned int64,class tensorrt_llm::runti
me::CudaStream const &)" (??$invokeCopyBatch@H@kernels@runtime@tensorrt_llm@@YAXAEBVIBuffer@12@AEAV312@00_KAEBVCudaStream@12@@Z
),函数 "public: void cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::RuntimeBuffers::setFromInputs(class std::m
ap<unsigned int64,class std::shared_ptr,struct std::less<unsigned int64>,cla
ss std::allocator<struct std::pair<unsigned __int64 const ,class std::shared_ptr
tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::Buf ferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4D ataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z),函数 "private: void cdecl tensorrt_l lm::runtime::GptSession::createKvCacheManager(int,int,int,int,class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfi g const &)" (?createKvCacheManager@GptSession@runtime@tensorrt_llm@@AEAAXHHHHAEBVKvCacheConfig@kv_cache_manager@batch_manager@3 @@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] runtimeBuffers.obj : error LNK2019: 无法解析的外部符号 "public: void __cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManag er::getBlockPointersOfBatch(class tensorrt_llm::runtime::ITensor &,int,int,int)const " (?getBlockPointersOfBatch@KVCacheManager @kv_cache_manager@batch_manager@tensorrt_llm@@QEBAXAEAVITensor@runtime@4@HHH@Z),函数 "public: void cdecl tensorrt_llm::runtime: :RuntimeBuffers::prepareContextStep(class std::shared_ptr
nvinfer1::ILogger>,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class s td::vector<unsigned char,class std::allocator
class std::map<unsigned __int64,class std::shared_ptr,struct std::less<unsigned
int64>,class std::allocator<struct std::pair<unsigned int64 const ,class std::shared_ptr<class tensorrt_llm::batch_manager:
:LlmRequest> > > > &,class std::unique_ptr<class tensorrt_llm::runtime::decoder_batch::Token const ,struct std::default_delete<
class tensorrt_llm::runtime::decoder_batch::Token const > > const &)" (?decoderSync@TrtGptModelInflightBatching@batch_manager@t
ensorrt_llm@@AEAAXAEAV?$map@_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@U?$less@_K@2@V?$allocator@U?$pair@$$C
B_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@std@@@2@@std@@AEBV?$unique_ptr@$$CBVToken@decoder_batch@runtime
@tensorrt_llm@@U?$default_delete@$$CBVToken@decoder_batch@runtime@tensorrt_llm@@@std@@@5@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1 \cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\Release\tensorrt_llm.dll : fatal error LNK1120: 11 个无法解析的外部命令 [C:\GPT\TensorRT -LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj]