when input'batch_size=4,seq_len=4,5,6,3,I find the gpt example's out is wrong.
I want to know if gpt of fasterTransformer support such kind of input.
when I modified the code,changing remove_padding in parallelGpt.h from true to false, I find the result is right.
Branch/Tag/Commit
main
Docker Image Version
default
GPU name
v100
CUDA Driver
11.6
Reproduced Steps