Open lkm2835 opened 1 year ago
Why is the rouge of Hugging Face different from the rouge of Faster Transformer even though the same weight is used?
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#running-t5-v11
Hugging Face (total latency: 21.826529 sec) rouge1 : 10.786476875527406 rouge2 : 1.8231246974441166 rougeL : 8.652689713627165 rougeLsum : 10.326607305635523 Faster Transformers (total latency: 7.036808000000001 sec) rouge1 : 10.91735083630513 rouge2 : 1.8454654301092783 rougeL : 8.76872604148143 rougeLsum : 10.453229536094794
I want to know why it's different.
Can I implement the results to be 100% the same?
It is almost impossible. Different gemm algorithm, and different kernel fusion all lead to different computing order. So, they must have small different in final results of transformer. And for generation model, such small gap would be accumulated and lead to different output ids finally.
Why is the rouge of Hugging Face different from the rouge of Faster Transformer even though the same weight is used?
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#running-t5-v11
I want to know why it's different.
Can I implement the results to be 100% the same?