Open RobotGF opened 1 year ago
if (token_generated_cb_ && step + 1 < (int)max_output_seq_len) {
setOutputTensors(output_tensors, input_tensors, max_input_length, max_output_seq_len);
sendTensorsToFirstPipelineNode(output_tensors, input_tensors);
if (pipeline_para_.rank_ == 0 && tensor_para_.rank_ == 0) {
token_generated_cb_(output_tensors, token_generated_ctx_);
}
}
in function setOutputTensors(output_tensors, input_tensors, max_input_length, max_output_seq_len);
// add sequence_length 1 here because the sequence_length of time step t is t - 1
param.max_sequence_length_final_step = 1;
this code is only for no decouped mode, when decouped is True, It will generate wrong output length
when using fastertransformer_backend decouped mode is True, the output will diff with decouped is False. And the output length is wrong
Branch/Tag/Commit
main
Docker Image Version
triton-py3-22.12
GPU name
A40/3090
CUDA Driver
525.105.17
Reproduced Steps