bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
Other
3.19k stars 328 forks source link

How much are the inference error? #103

Open wxyhv opened 3 years ago

wxyhv commented 3 years ago

Hello, I want to know if I use lightseq to do transformer model inference, the error value between lightseq with pytorch transformer in FP16. I don't find related document to explain about it. Thank you! looking for your reply~

Taka152 commented 3 years ago

What is the error value? Could you show me some cases?

wxyhv commented 3 years ago

What is the error value? Could you show me some cases?

"The error value" I said means the diff between "lightseq inference result" with "pytorch inference result" in transformer. I aware that faster transformer inference result is different with pytorch or tensorflow, so I want to know the diff of lightseq infer result compare to pytorch infer result, especially in FP16 mode.

Thank you!

neopro12 commented 3 years ago

The absolute difference between lightseq result and pytorch result is smaller than 1e-2, the relative difference is smaller than 1e-3.

wxyhv commented 3 years ago

The absolute difference between lightseq result and pytorch result is smaller than 1e-2, the relative difference is smaller than 1e-3.

wow, thank you! BTW, the difference you said is transformer FP32 mode or transformer FP16 mode? And Which op cause this difference? Could you show a whole diffrence excel in the readme?

neopro12 commented 3 years ago

FP16. Ops like layer norm and matrix multiplication may bring these difference. In general, because of the huge gap between top logits, the difference of logits between LS and pytorch will produce same result after softmax and topK.