ELS-RD / transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
https://els-rd.github.io/transformer-deploy/
Apache License 2.0
1.65k stars 150 forks source link

Question-Answering example not working for batch_size > 1 #159

Open lakshaykc opened 1 year ago

lakshaykc commented 1 year ago

I'm running demo/question-answering/triton_client.py from the examples directory. The script returns expected result with batch_size=1. However, if you make the batch_size > 1 in this line, it outputs only the result of the first element in the batch and other elements are ignored.

I saw #84 and #106 about the question-answering example and batch_size but I don't think they are related to this. The triton server does not yield in any errors.

Am I missing something here?