Open sleepwalker2017 opened 10 months ago
@byshiue hello, I modified the issue, please have a look. Thank you.
Hi we tried the latest code base and no issue found yet. Could u please try again?
And do u still have further issue or question now? If not, we'll close it soon.
System Info
CPU x86_64
GPU L40s
TensorRT branch: main
commid id:b57221b764bc579cbb2490154916a871f620e2c4
CUDA: | NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.3 |
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python build.py --model_dir /data/weilong.yu/vicuna-13b-v1.5/ \ --dtype float16 \ --use_gpt_attention_plugin float16 \ --use_gemm_plugin float16 \ --output_dir ./tmp/llama/13B/trt_engines/fp16/$2-gpu/ \ --max_batch_size $1 \ --tp_size $2 \ --world_size $2 --parallel_build \ --use_inflight_batching \ --remove_input_padding \ --paged_kv_cache \ --enable_context_fmha
mpirun -n 2 --allow-run-as-root benchmarks/gptSessionBenchmark --input_output_len "128;26" --batch_size 32 --model llama --engine_dir ../../examples/llama/tmp/llama/70B/trt_engines/fp8/2-gpu/
mpirun -n 2 --allow-run-as-root benchmarks/gptSessionBenchmark --input_output_len "128;26" --batch_size 32 --model llama --engine_dir ../../examples/llama/tmp/llama/70B/trt_engines/fp8/2-gpu/[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.)[TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2309] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2309, condition: satisfyProfile Runtime dimension does not satisfy any optimization profile.) [TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'kv_cache_block_pointers_0' has invalid shape (32, 2, 257), expected (-1, 2, -1) (/data/TRT-LLM/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:149) 1 0x55753ad94005 tensorrt_llm::common::throwRuntimeError(char const, int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 100
2 0x7fe6a1ca769f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std:: cxx11::basic_string<char, std::char_traits, std::allocator >, std::shared_ptr, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std:: cxx11::basic_string<char, std::char_traits, std::allocator > const, std::shared_ptr > > > const&) + 1823
3 0x7fe6a1c6325a tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator >&, std::vector<int, std::allocator > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const ) + 874
4 0x7fe6a1c64582 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3106
5 0x7fe6a1c65fb3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 3107
6 0x55753ad99a26 benchmarks/gptSessionBenchmark(+0x1aa26) [0x55753ad99a26]
7 0x7fe6883fbd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fe6883fbd90]
8 0x7fe6883fbe40 libc_start_main + 128
9 0x55753ad9b975 benchmarks/gptSessionBenchmark(+0x1c975) [0x55753ad9b975]
[TensorRT-LLM][ERROR] [TensorRT-LLM][ERROR] Assertion failed: Tensor 'kv_cache_block_pointers_0' has invalid shape (32, 2, 256), expected (-1, 2, -1) (/data/TRT-LLM/TensorRT-LLM/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:149)
1 0x55f9fb709005 tensorrt_llm::common::throwRuntimeError(char const*, int, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 100
2 0x7f400487869f tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::shared_ptr, std::hash<std:: cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std:: cxx11::basic_string<char, std::char_traits, std::allocator > const, std::shared_ptr > > > const&) + 1823
3 0x7f400483425a tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator >&, std::vector<int, std::allocator > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 874
4 0x7f4004835582 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&) + 3106
5 0x7f4004836fb3 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 3107
6 0x55f9fb70ea26 benchmarks/gptSessionBenchmark(+0x1aa26) [0x55f9fb70ea26]
7 0x7f3feafccd90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f3feafccd90]
8 0x7f3feafcce40 __libc_start_main + 128
9 0x55f9fb710975 benchmarks/gptSessionBenchmark(+0x1c975) [0x55f9fb710975]
Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[57264,1],0] Exit code: 1