Closed renaud closed 4 years ago
Hey @renaud we are currently doing a lot of inference benchmarking for Question Answering like described in deepset-ai/haystack/issues/39 where we also compare pytorch vs ONNX.
Concerning your throughput. I think it is pretty slow - but I am not sure how a T4 GPU performs compared to the V100s we used. One important paramter is the batch size. Did you test different batch size values?
Looking at Tanays post it takes 0.1621 seconds for a batch of size 64 to complete on a V100. That would make about 395 samples per sec. And this is done on QA, where the inference is much! more complex (lot of communication between GPU and CPU). Simple text classification should happen faster - my intuitive guess would be by a factor of 2-5 times...
Happy to interact here and make the textclassification inference faster together with you!
I did some speed benchmarking on text classification inference. I used a V100 GPU and tested various batch size + max_seq_len values with inference on 1000 texts:
So deviding 1000/3.57 we get 280 samples/second for seq len=128 and batch size=30 I would suggest you should try increasing the batch size. A T4 will still be slower than a V100, but 54 samples/s is really low. And I also realized that I might be wrong about textclassification inference being faster than QA inference - the numbers are comparable to a recent QA inference benchmark test.
Ok, why talking about intuition when one can also just check.
I tested QA inference vs Textclassification:
Text Classification on 1000 docs, max seq len 128 Batch size: 1, takes 14.947 Batch size: 3, takes 5.801 Batch size: 6, takes 3.904 Batch size: 10, takes 3.771 Batch size: 20, takes 3.758 Batch size: 30, takes 3.667
Question Answering on 1000 questions , max seq len 128 (doc + question just below 128 tokens) Batch size: 1, takes 16.096 Batch size: 3, takes 6.172 Batch size: 6, takes 5.290 Batch size: 10, takes 4.951 Batch size: 20, takes 5.044 Batch size: 30, takes 4.930
So QA inference seems a bit slower than TextClassification inside FARM (0.4.4)
closing this now for inactivity, feel free to reopen
Question
I am getting around 54 sentences/s on inference for text classification.
What do you think? is that good? Does this compare with what you get?
Additional context