NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.69k stars 879 forks source link

bs=4,seq_len=4,5,6,3,the output of gpt example is wrong #428

Open dearowen opened 1 year ago

dearowen commented 1 year ago

Branch/Tag/Commit

main

Docker Image Version

default

GPU name

v100

CUDA Driver

11.6

Reproduced Steps

when input'batch_size=4,seq_len=4,5,6,3,I find the gpt example's out is wrong.
I want to know if gpt of fasterTransformer support such kind of input.
when I modified the code,changing remove_padding in parallelGpt.h from true to false, I find the result is right.
byshiue commented 1 year ago

Please provide the end to end reproduced steps to help reproduce such issue, and show what issue you observe.