NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.59k stars 975 forks source link

fused_multihead_attention_v2 CUDA Error: CUDA_ERROR_INVALID_VALUE #1965

Open inkinworld opened 3 months ago

inkinworld commented 3 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

load Llama model, then send requests.

occurs some err.

CUDA Error: CUDA_ERROR_INVALID_VALUE

/app/tensorrt_llm/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_v2.h 481
https://github.com/NVIDIA/TensorRT-LLM/blob/250d9c293d5edbc2a45c20775b3150b1eb68b364/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_v2.h#L481

batch_manager

[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1       0x7ff4b80c7305 void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 149
2       0x7ff41db12ae0 tensorrt_llm::runtime::GptDecoderBatch::forwardSync(tensorrt_llm::runtime::decoder_batch::Token const&) + 96
3       0x7ff41dc39d36 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 7622
4       0x7ff41dbea4d4 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
5       0x7ff41dbf252f tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 287
6       0x7ff4c05f1253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7ff4c05f1253]
7       0x7ff4c0380ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff4c0380ac3]
8       0x7ff4c0412850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7ff4c0412850]
[TensorRT-LLM][WARNING] Step function failed, continuing.

Expected behavior

work well

actual behavior

throw exception, can't process request

additional notes

[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] Assertion failed: hasValues == configValue.has_value() (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/samplingConfig.h:46)
1       0x7faf940d9f31 tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102
2       0x7faef9b2b008 tensorrt_llm::runtime::SamplingConfig::SamplingConfig(std::vector<tensorrt_llm::runtime::SamplingConfig, std::allocator<tensorrt_llm::runtime::SamplingConfig> > const&) + 1720
3       0x7faef9b1e514 tensorrt_llm::runtime::GptDecoderBatch::newRequests(std::vector<int, std::allocator<int> > const&, std::vector<tensorrt_llm::runtime::decoder_batch::Request, std::allocator<tensorrt_llm::runtime::decoder_batch::Request> > const&, std::vector<tensorrt_llm::runtime::SamplingConfig, std::allocator<tensorrt_llm::runtime::SamplingConfig> > const&) + 404
4       0x7faef9c37203 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::setupDecoderStep(std::map<unsigned long, std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > > >&, std::vector<unsigned long, std::allocator<unsigned long> > const&) + 851
5       0x7faef9c394e7 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 5495
6       0x7faef9bea4d4 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
7       0x7faef9bf252f tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 287
8       0x7faf9bdf1253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7faf9bdf1253]
9       0x7faf9bb80ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7faf9bb80ac3]
10      0x7faf9bc12850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7faf9bc12850]
[TensorRT-LLM][WARNING] Step function failed, continuing.
CUDA Error: CUDA_ERROR_INVALID_VALUE /app/tensorrt_llm/cpp/tensorrt_llm/kernels/contextFusedMultiHeadAttention/fused_multihead_attention_v2.h 481
[TensorRT-LLM][ERROR] Encountered an error in forward function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1       0x7ff8cc198305 void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 149
2       0x7ff83db12ae0 tensorrt_llm::runtime::GptDecoderBatch::forwardSync(tensorrt_llm::runtime::decoder_batch::Token const&) + 96
3       0x7ff83dc39d36 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 7622
4       0x7ff83dbea4d4 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
5       0x7ff83dbf252f tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 287
6       0x7ff8e09f1253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7ff8e09f1253]
7       0x7ff8e0780ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff8e0780ac3]
8       0x7ff8e0812850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7ff8e0812850]
[TensorRT-LLM][ERROR] Encountered error for requestId 144103273: Encountered an error in forward function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1       0x7ff8cc198305 void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 149
2       0x7ff83db12ae0 tensorrt_llm::runtime::GptDecoderBatch::forwardSync(tensorrt_llm::runtime::decoder_batch::Token const&) + 96
3       0x7ff83dc39d36 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 7622
4       0x7ff83dbea4d4 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
5       0x7ff83dbf252f tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 287
6       0x7ff8e09f1253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7ff8e09f1253]
7       0x7ff8e0780ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff8e0780ac3]
8       0x7ff8e0812850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7ff8e0812850]
[TensorRT-LLM][ERROR] Encountered error for requestId 1997879650: Encountered an error in forward function: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaEventSynchronize(get()): an illegal memory access was encountered (/app/tensorrt_llm/cpp/include/tensorrt_llm/runtime/cudaEvent.h:66)
1       0x7ff8cc198305 void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 149
2       0x7ff83db12ae0 tensorrt_llm::runtime::GptDecoderBatch::forwardSync(tensorrt_llm::runtime::decoder_batch::Token const&) + 96
3       0x7ff83dc39d36 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::forward(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&) + 7622
4       0x7ff83dbea4d4 tensorrt_llm::batch_manager::GptManager::step(std::list<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, std::allocator<std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> > >&, std::set<unsigned long, std::less<unsigned long>, std::allocator<unsigned long> >&) + 36
5       0x7ff83dbf252f tensorrt_llm::batch_manager::GptManager::decoupled_execution_loop() + 287
6       0x7ff8e09f1253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7ff8e09f1253]
7       0x7ff8e0780ac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7ff8e0780ac3]
8       0x7ff8e0812850 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x126850) [0x7ff8e0812850]
juney-nvidia commented 3 months ago

@inkinworld Hi, have your tried with the latest main branch to see whether the issue still exist? Thanks June

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."