-
Thanks for your amazing work. It seems the inference time of onnx model is better than the tensorrt model. Is there anything wrong with my testing ? I got 150 ms time inference for the onnx model and …
-
### Motivation
In some industry project, we need multiple models to hand multiple different defects type. Under this case, we need one GPU to make inference against different defects with related mod…
-
Code file name "clip_txt.py":
_MODELS = [
'RN50::openai',
'RN50::yfcc15m',
'RN50::cc12m',
'RN101::openai',
'RN101::yfcc15m',
'RN50x4::openai',
'ViT-B-32::openai',
…
gyd-a updated
1 month ago
-
Can you help? this issue when I run ./start-triton-server.sh
Im using
**nvcr.io/nvidia/tritonserver:21.07-py3**
> root@bf5cff23afa2:/apps# bash ./start-triton-server.sh --models yolov9-e-qat …
-
Hi,
I am doing some research and i am looking for feature matching models that are accurate but can also be optimized for edge devices. For the optimization TensorRT is used and deployed on a Jetson…
-
Outlines currently support the vLLM inference engine, it would be great if it could also support the tensorRT-LLM inference engine.
-
the size of the model trained on my own data is about 165M, and the inference time, including post-processing, is approximately 237ms
-
Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia …
-
@CarkusL Thanks for your great work.I Merge pfe_sim.onnx and rpn.onnx to pointpillars_trt.onnx. And use it by TensorRt to inference.But the result is error which is showed in the link. could you help…
-
**Description**
I have converted a .onnx model file to .plan(TensorRT) file using the 24.02-py3 docker image using builder.max_batch_size = 16.
When I tried to deploy this model on Triton Infere…