Closed yan123456jie closed 7 months ago
How many iteration you are using, first few iteration will take longer time due to warm up GPU and initialization. I would highly recommend that use our trtexec tool to test the perf.
closing since no activity for more than 3 weeks, pls reopen if you still have question, thanks!
hi, can sentence-transformers e.g. https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 already used with tensorRT-LLM? my goal is to compile a sentence-transformers/all-MiniLM-L6-v2 model without quantization using tensorRT-LLM and serve with triton... are there any docs how to make the model ready for tensorRT as well as onnx? cc @ttyio @zerollzeng
Use tensorrt inference bert, speed slow than onnxruntime,tensorrt is 10ms,onnx is 6ms,model just simple bert classification model. Could some one help me? onnx code
tensorrt code