-
Hi,
I generated the model.plan engine file on the same server as Triton. I also built TensorRT OSS but I get the following error when loading the engine:
```
E0226 17:02:00.421746 1 logging.cc:43…
-
**Is your feature request related to a problem? Please describe.**
Currently the fastest way of executing models for Computer Vision inference is by running a TensorRT-optimised model. It is widely a…
-
While serving code_llama model and requesting `/generate_stream` with `stream: true`, `text_output` field in response does not contain any spaces (`" "`). Is this the expected behavior (i.e. users hav…
-
This is supported by Triton, we just need to add support for it to the proxy. I have written code to do this independently here: https://moyix.net/~moyix/batch_codegen_full.py ; I just need to integra…
-
**Description**
Hi all,
I have an IR model. I was trying to deploy it on Triton server `v23.10`. However, it encountered this error.
```
Warning: '--strict-model-config' has been deprecated…
-
Hi!
I have build TensorRT via ONNX, and can you tell me how to use C++ to call resnet50_trt generated "model. dali "to preprocess data?
-
Hi - I am trying to accelerate some T5 models and I get this error. How do I fix this?
Command to reproduce:
`convert_model -m "valhalla/t5-small-qa-qg-hl" --backend tensorrt onnx --seq-len 16 1…
-
**Description**
Upgrading from 22.10, ORT models are consuming significantly more memory and running VRAM OOM.
**Triton Information**
What version of Triton are you using?
Upgraded to Triton 2.35.…
-
## Branch/Tag/Commit
Based on V5.3, and merged https://github.com/NVIDIA/FasterTransformer/commit/e2dd1641880840db76b8902b34106c85b026a0af to solve early_stop
## Docker Image Version
Refer to fas…
-
To begin, I would like to thank the triton inference server team !
You provide us with a very convenient tool to deploy deep learning models :)
**Is your feature request related to a problem? Plea…