NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.61k stars 979 forks source link

Tensort-llm 0.71 win11 compiling fails with batch_manager 0.5.0 lib #820

Closed Sonicwanjie closed 10 months ago

Sonicwanjie commented 10 months ago

适用于 .NET Framework MSBuild 版本 17.8.3+195e7f5a3

1>Checking Build System Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/runtime/CMakeLists.txt common_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\common\common_src.dir\Release\common_src.lib runtime_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\runtime\runtime_src.dir\Release\runtime_src.lib layers_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\layers\layers_src.dir\Release\layers_src.lib kernels_src.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\kernels\kernels_src.dir\Release\kernels_src.lib Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/CMakeLists.txt tensorrt_llm_static.vcxproj -> C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\Release\tensorrt_llm_static.lib Building Custom Rule C:/GPT/TensorRT-LLM-0.7.1/cpp/tensorrt_llm/CMakeLists.txt Auto build dll exports tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2005: "public: static class tensorrt_llm::com mon::Logger __cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经 在 logger.obj 中定义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(microBatchScheduler.obj) : error LNK2005: "public: static class tensorrt_llm::common::Log ger cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经在 logger.o bj 中定义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(batchScheduler.obj) : error LNK2005: "public: static class tensorrt_llm::common::Logger * cdecl tensorrt_llm::common::Logger::getLogger(void)" (?getLogger@Logger@common@tensorrt_llm@@SAPEAV123@XZ) 已经在 logger.obj 中定 义 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] 正在创建库 C:/GPT/TensorRT-LLM-0.7.1/cpp/build/tensorrt_llm/Release/tensorrt_llm.lib 和对象 C:/GPT/TensorRT-LLM-0.7.1/cpp/build/ten sorrt_llm/Release/tensorrt_llm.exp LINK : warning LNK4098: 默认库“LIBCMT”与其他库的使用冲突;请使用 /NODEFAULTLIB:library [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensor rt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KVCac heManager(int,int,int,int,int,int,int,int,int,int,enum nvinfer1::DataType,class std::shared_ptr<class tensorrt_llm::runtime::Cu daStream>,bool)" (??0KVCacheManager@kv_cache_manager@batch_manager@tensorrtllm@@QEAA@HHHHHHHHHHW4DataType@nvinfer1@@V?$shared ptr@VCudaStream@runtime@tensorrt_llm@@@std@@_N@Z),函数 "private: void cdecl tensorrt_llm::runtime::GptSession::createKvCacheMan ager(int,int,int,int,class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &)" (?createKvCacheManager@GptSes sion@runtime@tensorrt_llm@@AEAAXHHHHAEBVKvCacheConfig@kv_cache_manager@batch_manager@3@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\c pp\build\tensorrt_llm\tensorrt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager:: addSequence(int,int,int,class std::shared_ptr const &)" (?addSequence@KVCacheMan ager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAAXHHHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z),函数 "private: void cdecl tensorrt_llm::runtime::GptSession::kvCacheAddSequences(int,int,int)" (?kvCacheAddSequences@GptSession@r untime@tensorrt_llm@@AEAAXHHH@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager:: removeSequence(int,class std::shared_ptr const &)" (?removeSequence@KVCacheManag er@kv_cache_manager@batch_manager@tensorrt_llm@@QEAAXHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z),函数 "pr ivate: void cdecl tensorrt_llm::runtime::GptSession::generateBatched(class std::vector<class tensorrt_llm::runtime::Generatio nOutput,class std::allocator > &,class std::vector<class tensorrt_llm::runtime:: GenerationInput,class std::allocator > const &,class tensorrt_llm::runtime::Sampl ingConfig const &,class std::function<void cdecl(int,bool)> const &)" (?generateBatched@GptSession@runtime@tensorrt_llm@@AEAA XAEAV?$vector@VGenerationOutput@runtime@tensorrt_llm@@V?$allocator@VGenerationOutput@runtime@tensorrt_llm@@@std@@@std@@AEBV?$ve ctor@VGenerationInput@runtime@tensorrt_llm@@V?$allocator@VGenerationInput@runtime@tensorrt_llm@@@std@@@5@AEBVSamplingConfig@23@ AEBV?$function@$$A6AXH_N@Z@5@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] gptSession.obj : error LNK2019: 无法解析的外部符号 "public: static int cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheMan ager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class
tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::Buf ferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4D ataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z),函数 "private: void cdecl tensorrt_l lm::runtime::GptSession::createKvCacheManager(int,int,int,int,class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfi g const &)" (?createKvCacheManager@GptSession@runtime@tensorrt_llm@@AEAAXHHHHAEBVKvCacheConfig@kv_cache_manager@batch_manager@3 @@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] runtimeBuffers.obj : error LNK2019: 无法解析的外部符号 "public: void __cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManag er::getBlockPointersOfBatch(class tensorrt_llm::runtime::ITensor &,int,int,int)const " (?getBlockPointersOfBatch@KVCacheManager @kv_cache_manager@batch_manager@tensorrt_llm@@QEBAXAEAVITensor@runtime@4@HHH@Z),函数 "public: void cdecl tensorrt_llm::runtime: :RuntimeBuffers::prepareContextStep(class std::shared_ptr const &,int,class tensorrt_llm: :runtime::BufferManager &,class tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const *,int,class tensorrt_llm::r untime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &)" (?prepareContextStep@RuntimeBuffers@runtime@t ensorrt_llm@@QEAAXAEBV?$shared_ptr@VITensor@runtime@tensorrt_llm@@@std@@HAEAVBufferManager@23@PEBVKVCacheManager@kv_cache_manag er@batch_manager@3@HAEBVGptModelConfig@23@AEBVWorldConfig@23@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tens orrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: static class std::sh ared_ptr cdecl tensorrt_llm::runtime::NcclCommunicator::createPipelineComm(cla ss tensorrt_llm::runtime::WorldConfig const &,class nvinfer1::ILogger &)" (?createPipelineComm@NcclCommunicator@runtime@tensorr t_llm@@SA?AV?$shared_ptr@VNcclCommunicator@runtime@tensorrt_llm@@@std@@AEBVWorldConfig@23@AEAVILogger@nvinfer1@@@Z),函数 "public: __cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TrtGptModelInflightBatching(int,class std::shared_ptr<class
nvinfer1::ILogger>,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class s td::vector<unsigned char,class std::allocator > const &,bool,enum tensorrt_llm::batch_manager::batch_scheduler:: SchedulerPolicy,class tensorrt_llm::batch_manager::TrtGptModelOptionalParams const &)" (??0TrtGptModelInflightBatching@batch_ma nager@tensorrt_llm@@QEAA@HV?$shared_ptr@VILogger@nvinfer1@@@std@@AEBVGptModelConfig@runtime@2@AEBVWorldConfig@62@AEBV?$vector@E V?$allocator@E@std@@@4@_NW4SchedulerPolicy@batch_scheduler@12@AEBVTrtGptModelOptionalParams@12@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM -0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "void
cdecl tensorrt_llm::r untime::kernels::invokeCopyBatch(class tensorrt_llm::runtime::IBuffer const &,class tensorrt_llm::runtime::IBuffer &,class tensorrt_llm::runtime::IBuffer const &,class tensorrt_llm::runtime::IBuffer const &,unsigned int64,class tensorrt_llm::runti me::CudaStream const &)" (??$invokeCopyBatch@H@kernels@runtime@tensorrt_llm@@YAXAEBVIBuffer@12@AEAV312@00_KAEBVCudaStream@12@@Z ),函数 "public: void cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::RuntimeBuffers::setFromInputs(class std::m ap<unsigned int64,class std::shared_ptr,struct std::less<unsigned int64>,cla ss std::allocator<struct std::pair<unsigned __int64 const ,class std::shared_ptr

const &,class std::vector<unsigned int64,class std::allocator<unsigned int64> > const &,int,int,int,class tensorrt_l lm::batch_manager::TrtGptModelInflightBatching::DecoderBuffers &,class tensorrt_llm::batch_manager::kv_cache_manager::KVCacheMa nager &,class tensorrt_llm::runtime::TllmRuntime &,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runt ime::WorldConfig const &)" (?setFromInputs@RuntimeBuffers@TrtGptModelInflightBatching@batch_manager@tensorrt_llm@@QEAAXAEBV?$ma p@_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@U?$less@_K@2@V?$allocator@U?$pair@$$CB_KV?$shared_ptr@VLlmReque st@batch_manager@tensorrt_llm@@@std@@@std@@@2@@std@@AEBV?$vector@_KV?$allocator@_K@std@@@6@HHHAEAVDecoderBuffers@234@AEAVKVCach eManager@kv_cache_manager@34@AEAVTllmRuntime@runtime@4@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@runtime@4@@Z) 中引用了该符号 [C:\G PT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorr t_llm::runtime::NcclCommunicator::send(int const *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream con st &,class nvinfer1::ILogger &)const " (??$send@$$CBH@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEBH_KHAEBVCudaStream@12@AEAV ILogger@nvinfer1@@@Z),函数 "public: void cdecl tensorrt_llm::runtime::NcclCommunicator::send(class tensorrt_llm::runtime:: IBuffer const &,int,class tensorrt_llm::runtime::CudaStream const &,class nvinfer1::ILogger &)const " (??$send@H@NcclCommunicat or@runtime@tensorrt_llm@@QEBAXAEBVIBuffer@12@HAEBVCudaStream@12@AEAVILogger@nvinfer1@@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cp p\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorr t_llm::runtime::NcclCommunicator::receive(int *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream const &,class nvinfer1::ILogger &)const " (??$receive@H@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEAH_KHAEBVCudaStream@12@AEAVILogger@nvi nfer1@@@Z),函数 "public: void cdecl tensorrt_llm::runtime::NcclCommunicator::receive(class tensorrt_llm::runtime::IBuffer &,int,class tensorrt_llm::runtime::CudaStream const &,class nvinfer1::ILogger &)const " (??$receive@H@NcclCommunicator@runtime@ tensorrt_llm@@QEBAXAEAVIBuffer@12@HAEBVCudaStream@12@AEAVILogger@nvinfer1@@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\ten sorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorr t_llm::runtime::NcclCommunicator::send(unsigned char const *,unsigned __int64,int,class tensorrt_llm::run time::CudaStream const &,class nvinfer1::ILogger &)const " (??$send@$$CBE@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEBE_KHAE BVCudaStream@12@AEAVILogger@nvinfer1@@@Z),函数 "private: void cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::d ecoderSync(class std::map<unsigned int64,class std::shared_ptr,struct std::les s,class std::allocator<struct std::pair<unsigned int64 const ,class std::shared_ptr<class tensorrt_llm::bat ch_manager::LlmRequest> > > > &,class std::unique_ptr<class tensorrt_llm::runtime::decoder_batch::Token const ,struct std::defa ult_delete > const &)" (?decoderSync@TrtGptModelInflightBatching@batc h_manager@tensorrt_llm@@AEAAXAEAV?$map@_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@U?$less@_K@2@V?$allocator@ U?$pair@$$CB_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@std@@@2@@std@@AEBV?$unique_ptr@$$CBVToken@decoder_ba tch@runtime@tensorrt_llm@@U?$default_delete@$$CBVToken@decoder_batch@runtime@tensorrt_llm@@@std@@@5@@Z) 中引用了该符号 [C:\GPT\TensorR T-LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.obj) : error LNK2019: 无法解析的外部符号 "public: void cdecl tensorr t_llm::runtime::NcclCommunicator::receive(unsigned char *,unsigned int64,int,class tensorrt_llm::runtime::Cuda m@12@AEAVILogger@nvinfer1@@@Z),函数 "private: void __cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderSync(

class std::map<unsigned __int64,class std::shared_ptr,struct std::less<unsigned
int64>,class std::allocator<struct std::pair<unsigned int64 const ,class std::shared_ptr<class tensorrt_llm::batch_manager:
:LlmRequest> > > > &,class std::unique_ptr<class tensorrt_llm::runtime::decoder_batch::Token const ,struct std::default_delete<
class tensorrt_llm::runtime::decoder_batch::Token const > > const &)" (?decoderSync@TrtGptModelInflightBatching@batch_manager@t
ensorrt_llm@@AEAAXAEAV?$map@_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@U?$less@_K@2@V?$allocator@U?$pair@$$C
B_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@std@@@2@@std@@AEBV?$unique_ptr@$$CBVToken@decoder_batch@runtime
@tensorrt_llm@@U?$default_delete@$$CBVToken@decoder_batch@runtime@tensorrt_llm@@@std@@@5@@Z) 中引用了该符号 [C:\GPT\TensorRT-LLM-0.7.1 \cpp\build\tensorrt_llm\tensorrt_llm.vcxproj] C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\Release\tensorrt_llm.dll : fatal error LNK1120: 11 个无法解析的外部命令 [C:\GPT\TensorRT -LLM-0.7.1\cpp\build\tensorrt_llm\tensorrt_llm.vcxproj]

Tlntin commented 10 months ago

maybe you can install trt-llm directly. link

Sonicwanjie commented 10 months ago

Thanks you. This link is work. bug how to debug 0.71?

Tlntin commented 10 months ago

if you want to debug C++, you need to build with type=Debug. I thank Linux and Docker may work well!

Tlntin commented 10 months ago

if you only want upgrade to to 0.7.1, just install tensorrt_llm==0.7.1.

Sonicwanjie commented 10 months ago

Linux in python 3.12 pass I try 0.7.1 on win11, but batch_manager not work. Move 0.7.0 batch_manager to 0.7.1,fail.

Sonicwanjie commented 10 months ago

gptManager.obj : error LNK2001: 无法解析的外部符号 "public: enum tensorrt_llm::batch_manager::BatchManagerErrorCode_t cdecl tensorrt_llm::batch_manager::Gpt Manager::shutdown(void)" (?shutdown@GptManager@batch_manager@tensorrt_llm@@QEAA?AW4BatchManagerErrorCode_t@23@XZ) [C:\GPT\TensorRT-LLM-0.7.1\cpp\buil
d\tensorrt_llm\pybind\bindings.vcxproj] gptManager.obj : error LNK2001: 无法解析的外部符号 "public:
cdecl tensorrt_llm::batch_manager::GptManager::GptManager(class std::filesystem::path const &,en um tensorrt_llm::batch_manager::TrtGptModelType,int,enum tensorrt_llm::batch_manager::batch_scheduler::SchedulerPolicy,class std::function<class std:
:list<class std::shared_ptr,class std::allocator<class std::shared_ptr<class tensorrt_llm::batch
_manager::InferenceRequest> > > cdecl(int)>,class std::function<void cdecl(unsigned int64,class std::list<class tensorrt_llm::batch_manager::Na
medTensor,class std::allocator > const &,bool,class std::basic_string<char,struct std::char_traits<ch
ar>,class std::allocator > const &)>,class std::function<class std::unordered_set<unsigned int64,struct std::hash<unsigned int64>,struct st
d::equal_to<unsigned
int64>,class std::allocator > cdecl(void)>,class std::function<void cdecl(class std::basic_string<char,s
truct std::char_traits,class std::allocator > const &)>,class tensorrt_llm::batch_manager::TrtGptModelOptionalParams const &,class std::o
ptional,class std::optional,bool)" (??0GptManager@batch_manager@tensorrt_llm@@QEAA@AEBVpath@filesystem@std@@W4TrtGptModelType@
12@HW4SchedulerPolicy@batch_scheduler@12@V?$function@$$A6A?AV?$list@V?$shared_ptr@VInferenceRequest@batch_manager@tensorrt_llm@@@std@@V?$allocator@V?
$shared_ptr@VInferenceRequest@batch_manager@tensorrt_llm@@@std@@@2@@std@@H@Z@5@V?$function@$$A6AX_KAEBV?$list@VNamedTensor@batch_manager@tensorrt_llm
@@V?$allocator@VNamedTensor@batch_manager@tensorrt_llm@@@std@@@std@@_NAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@2@@Z@5@V?$function@
$$A6A?AV?$unordered_set@_KU?$hash@_K@std@@U?$equal_to@_K@2@V?$allocator@_K@2@@std@@XZ@5@V?$function@$$A6AXAEBV?$basic_string@DU?$char_traits@D@std@@V
?$allocator@D@2@@std@@@Z@5@AEBVTrtGptModelOptionalParams@12@V?$optional@_K@5@V?$optional@H@5@_N@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\
pybind\bindings.vcxproj] tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KV CacheManager(int,int,int,int,int,int,int,int,int,int,enum nvinfer1::DataType,class std::shared_ptr,bool)" (?
?0KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAA@HHHHHHHHHHW4DataType@nvinfer1@@V?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@st
d@@_N@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj] tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: void
cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManage r::addSequence(int,int,int,class std::shared_ptr const &)" (?addSequence@KVCacheManager@kv_cache_manag
er@batch_manager@tensorrt_llm@@QEAAXHHHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensor
rt_llm\pybind\bindings.vcxproj] tensorrt_llm_static.lib(gptSession.obj) : error LNK2001: 无法解析的外部符号 "public: void cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManage r::removeSequence(int,class std::shared_ptr const &)" (?removeSequence@KVCacheManager@kv_cache_manager
@batch_manager@tensorrt_llm@@QEAAXHAEBV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_l
lm\pybind\bindings.vcxproj] tensorrt_llm_batch_manager_static.lib(GptManager.cpp.obj) : error LNK2001: 无法解析的外部符号 "public: static class tensorrt_llm::runtime::WorldConfig
cdecl tensorrt_llm::runtime::WorldConfig::mpi(class nvinfer1::ILogger &,int,class std::optional,class std::optional)" (?mpi@WorldConfig@runtime@
tensorrt_llm@@SA?AV123@AEAVILogger@nvinfer1@@HV?$optional@H@std@@1@Z) [C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj] C:\GPT\TensorRT-LLM-0.7.1\cpp\build\tensorrt_llm\pybind\Release\bindings.cp310-win_amd64.pyd : fatal error LNK1120: 6 个无法解析的外部命令 [C:\GPT\TensorRT-LLM -0.7.1\cpp\build\tensorrt_llm\pybind\bindings.vcxproj]

tp5uiuc commented 10 months ago

Hi @Sonicwanjie. Thanks for reporting the issue.

1) We haven't released the official tensorrt_llm==0.7.1 and corresponding batch_manager lib file yet, so please be on the lookout for that. I will update the thread once it is released :) 2) Can you share the build commands you use in the original error log? From the signatures, I think the linking error is because the libs are being built from main and the rel branch batch_manager is being utilized, leading to these link issues.

In any case, let me ping you after (1) is completed, and then we can investigate if the issue persists. Thanks for your patience!

Sonicwanjie commented 10 months ago

@tp5uiuc Hi I finished compiling 0.7.1, but ended up running the model. It reports an error

RuntimeError: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T) nullptr, idVals, (int) nullptr, vocabSize batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) 8, stream): no kernel image is available for execution on the device (C:\GPT\TensorRT-LLM-0.7.1\cpp\tensorrt_llm\kernels\samplingTopPKernels.cu:326)

I modified build_wheel.py to cmake_generator = "-A X64 -T host=x64". python ./scripts/build_wheel.py --cuda_architectures "90-real" --trt_root "C:/Users/Sonic/inference/TensorRT"

Sonicwanjie commented 10 months ago

python ./scripts/build_wheel.py --cuda_architectures "89-real;90-real" --trt_root "C:/Users/Sonic/inference/TensorRT" cmake_generator = "-A x64"

On win11 0.71 now works.

Sonicwanjie commented 10 months ago

If I choice option BUILD TESTS, yet fail.

1>Auto build dll exports 1> 正在创建库 C:/GPT/TensorRT-LLM/cpp/build/tensorrt_llm/Release/tensorrt_llm.lib 和对象 C:/GPT/TensorRT-LLM/cpp/build/tensorrt_llm/Release/tensorrt_llm.exp 1>LINK : warning LNK4098: 默认库“LIBCMT”与其他库的使用冲突;请使用 /NODEFAULTLIB:library 1>gptSession.obj : error LNK2019: 无法解析的外部符号 "public: cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KVCacheManager(int,int,int,int,int,int,int,int,int,int,int,bool,enum nvinfer1::DataType,class std::shared_ptr,bool)" (??0KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAA@HHHHHHHHHHH_NW4DataType@nvinfer1@@V?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@std@@0@Z),函数 "private: void __cdecl tensorrt_llm::runtime::GptSession::createKvCacheManager(int,int,int,int,int,class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &)" (?createKvCacheManager@GptSession@runtime@tensorrt_llm@@AEAAXHHHHHAEBVKvCacheConfig@kv_cache_manager@batch_manager@3@@Z) 中引用了该符号 1>tensorrt_llm_batch_manager_static.lib(kvCacheManager.cpp.obj) : error LNK2019: 无法解析的外部符号 "int cdecl tensorrt_llm::mpi::getCommWorldSize(void)" (?getCommWorldSize@mpi@tensorrt_llm@@YAHXZ),函数 "public: static int cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::BufferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4DataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z) 中引用了该符号 1>tensorrt_llm_batch_manager_static.lib(kvCacheManager.cpp.obj) : error LNK2019: 无法解析的外部符号 "void __cdecl tensorrt_llm::mpi::allreduce(void const ,void ,int,enum tensorrt_llm::mpi::MpiType,enum tensorrt_llm::mpi::MpiOp,struct tensorrt_llm::mpi::MpiComm)" (?allreduce@mpi@tensorrt_llm@@YAXPEBXPEAXHW4MpiType@12@W4MpiOp@12@UMpiComm@12@@Z),函数 "public: static int cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::BufferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4DataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z) 中引用了该符号 1>C:\GPT\TensorRT-LLM\cpp\build\tensorrt_llm\Release\tensorrt_llm.dll : fatal error LNK1120: 3 个无法解析的外部命令

tp5uiuc commented 10 months ago

Thanks @Sonicwanjie. Two updates 1) We have released wheel 0.7.1 for tensorrt-llm on windows, so please feel free to use that. 2) Can you comment on my original question?

From the signatures, I think the linking error is because the code/libs are being built from main and the rel branch batch_manager is being utilized, leading to these link issues.

From the latest error logs as well, I think you are building tensorrt-llm from the main branch. Can you instead build from rel branch? git clone --branch rel https://github.com/NVIDIA/TensorRT-LLM.git and run the build commands from there? The tensorrt_llm_batch_manager_static.lib is only intended to work with the release branch (main branch has different signatures and so you will get linker errors).

Sonicwanjie commented 10 months ago

@tp5uiuc Thanks,