Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Other
1.49k stars 198 forks source link

Turbo Inference Slower than FastAI #158

Open Charul opened 4 years ago

Charul commented 4 years ago

I am trying to use Turbo Transformer for inferencing on a trained BERT Transformers(Fastai with HuggingFace). I followed the steps mentioned under the section : 'How to customised your post-processing layers after BERT encoder' from the file' and customised the bert_for_sequence_classification_example.py.

It appears that the time taken for inferencing for _Turbo is greater than Fast AI!_

Here is the screenshot of the inferencing time for a simple Sentiment Prediction task for the below statement : '@AmericanAir @united contact me, or do something to alleviate this terrible, terrible service. But no, your 22 year old social media guru'

Screenshot 2020-08-12 at 11 22 50

In comparison to Fast AI :

Screenshot 2020-08-13 at 09 22 56

Has anyone experienced something similar? I might be missing out on something causing this result. Or would it only make sense to compare the timings on larger test data?

feifeibear commented 4 years ago

Let me make sure you are using CPU for inference and your turbo version is 0.4.1. Generally, the first inference after runtime launched is very slow, you need to warm up the engine with one initial dummy inference.