Ki6an / fastT5

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Apache License 2.0
564 stars 72 forks source link

Is fastT5 qunatization slower than pytorch dynamic quantization? #72

Open parikshitsaikia1619 opened 1 year ago

parikshitsaikia1619 commented 1 year ago

Hello @Ki6an,

I am working on speeding up a finetuned t5-mini batch cpu inference.

On the batch size = 10, sequence length = 300 tokens:

Maybe I am doing something wrong, but after fastT5 it was supposed to be faster right?

pytorch: image

fastT5 image

Collab notebook link: Link

Please let me know your thoughts.