NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.7k stars 2.12k forks source link

host_runtime_perf_knobs usage issue: [TRT] [E] IExecutionContext::enqueueV3: Error Code 3: API Usage Error #4186

Open yixue-qq opened 1 week ago

yixue-qq commented 1 week ago

I'm trying to write a unit test for flash attention using version 0.14.0.dev2024100100.

I noticed that host_runtime_perf_knobs is a new feature in recent versions. Here are how I use it and the reported error code:

` with tensorrt_llm.net_guard(net):

            input_dim_range = OrderedDict([
                ('num_tokens', [batch_size*1, batch_size*max_seq_len]),
                ('hidden_size', [hidden_size, hidden_size]),
            ])
            trt_hidden_states = Tensor(
                name='hidden_states',
                shape=[-1, hidden_size],
                dtype=tensorrt_llm.str_dtype_to_trt(dtype),
                dim_range=input_dim_range)
             runtime_perf_knobs = Tensor(name='host_runtime_perf_knobs',
                                        shape=[max_seq_len],
                                        dtype=tensorrt_llm.str_dtype_to_trt('int64'),
                                        dim_range=OrderedDict([('perf_knob_size', [max_seq_len, max_seq_len])])
                                        )

`

attention_params=AttentionParams( sequence_length=sequence_length_tensor, context_lengths=context_lengths_tensor, host_request_types=host_request_types_tensor, max_context_length=context_length, host_context_lengths=host_context_lengths_tensor, host_runtime_perf_knobs=runtime_perf_knobs)

The error is: [10/09/2024-06:22:33] [TRT] [E] IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter check failed, condition: mContext.profileObliviousBindings.at(profileObliviousIndex) != nullptr. Address is not set for input tensor host_runtime_perf_knobs. Call setInputTensorAddress or setTensorAddress before enqueue/execute.)

Any ideas why?

lix19937 commented 1 week ago

Address is not set for input tensor host_runtime_perf_knobs. Call setInputTensorAddress or setTensorAddress before enqueue/execute.