Is fastT5 qunatization slower than pytorch dynamic quantization? - Githubissues

Ki6an / fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

Apache License 2.0

564 stars 72 forks source link

Is fastT5 qunatization slower than pytorch dynamic quantization? #72

Open parikshitsaikia1619 opened 1 year ago

parikshitsaikia1619 commented 1 year ago

Hello @Ki6an,

I am working on speeding up a finetuned t5-mini batch cpu inference.

On the batch size = 10, sequence length = 300 tokens:

t5-mini inference speed: 3 sec
t5-mini after pytorch built-in dynamic quantization: 2.3 sec
fastT5 after converting to onnx and quantization: 5.9 sec !!

Maybe I am doing something wrong, but after fastT5 it was supposed to be faster right?

pytorch:

fastT5

Collab notebook link: Link

Please let me know your thoughts.