Open akhoroshev opened 10 months ago
You could mark it as output when you build the engine. Use GPT as example, you could mark the hidden_states here as output.
HI @byshiue !
For experiment I added line here
hidden_states.mark_output('hidden_states_output_test', self.dtype)
After that I created engine and run gptManagerBenchmark
.
Also I modified gptManagerBenchmark to see output tensors
for (auto& tensor: response_tensors) {
TLLM_LOG_INFO(tensor.name);
auto shape = tensor.tensor->getShape();
TLLM_LOG_INFO("shape");
std::cout << "[";
for (auto i = 0; i < shape.nbDims; i++)
std::cout << shape.d[i] << ", ";
std::cout << "]" << std::endl;
TLLM_LOG_INFO("type");
auto type = tensor.tensor->getMemoryType();
std::cout << static_cast<int32_t>(type) << std::endl;
}
And I only can see "default" output tensors in output
[TensorRT-LLM][INFO] output_ids
[TensorRT-LLM][INFO] shape
[1, 1, 1213, ]
[TensorRT-LLM][INFO] type
1
[TensorRT-LLM][INFO] sequence_length
[TensorRT-LLM][INFO] shape
[1, 1, ]
[TensorRT-LLM][INFO] type
1
[TensorRT-LLM][INFO] output_log_probs
[TensorRT-LLM][INFO] shape
[1, 1, 1024, ]
[TensorRT-LLM][INFO] type
1
[TensorRT-LLM][INFO] cum_log_probs
[TensorRT-LLM][INFO] shape
[1, 1, ]
[TensorRT-LLM][INFO] type
1
Is it possible to forward "custom" output tensor for GptManager?
It might be hard to add in c++ runtime, you could try adding on python runtime first.
It might be hard to add in c++ runtime, you could try adding on python runtime first.
I have the same question. If the python code get modified as your above suggestion, then I build the gpt model by trtllm. Can the hidden_states get passed into postprocessing part?
Is there a way to access to context hidden states? I mean tensor with shape
[batch_size, max_input_token_num, hidden_size]
? In FasterTransformers it was easy. At this point (after context decoding phase) I just access the tensor (decoder_output_tensors["decoder_output"]
).