-
I would like to use techniques such as Multi-instance Support supported by the tensorrt-llm backend. In the documentation, I can see that multiple models are served using modes like Leader mode and …
-
### System Info
a100
### Who can help?
@byshiue
@juney-nvidia
### Information
- [ ] The official example scripts
- [x] My own modified scripts
### Tasks
- [x] An officially supported task in th…
-
Hi guys
From: https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/instance_state.cc#L1148
I noticed that when processing the state tensor, Triton will copy the state tensor…
-
• Hardware Platform (Jetson / GPU) Jetson nano Devkit
• DeepStream Version 6.0.0
• JetPack Version (valid for Jetson only) 4.6
• TensorRT Version 8.2.1.8
I have an script running on Jetson Xavie…
-
How do I save the model trained with the example as a pt file or something else and convert it to an onnx model。
(请问如何将用示例训练好的模型保存为pt文件或者其他,并且转换为onnx模型??)
-
System Info
GPU: NVIDIA RTX 4090
TensorRT-LLM 0.13
quest 1: How can I use the OpenAPI to perform inference on a TensorRT engine model?
root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/apps# pyt…
-
### Describe the issue
Below is the best configuration I could find to get the model running as fast as possible on Jetson ORIN using TensorRT + Onnxruntime backend
```
session_options.SetIntraO…
-
Hello Everyone,
i wrote a Inference within Nvidias TensorRT and got Predictions from my model.
However i dont know how to properly postprocess the predictions to get and draw the right bboxes.
I …
-
### System Info
infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32
OS: linux
model_base PEG
nvidia-smi: cuda version …
-
**Description**
Triton does not clear or release GPU memory when there is a pause in inference. In the attached diagrams the same model is being used. It is served via ONNX.
![image (1)](https:…