tensorrt-inference Search Results

1000+ results
for tensorrt-inference

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #6155

[Usage]: How to use Multi-instance in Vllm? (Model replicati…

I would like to use techniques such as Multi-instance Support supported by the tensorrt-llm backend. In the documentation, I can see that multiple models are served using modes like Leader mode and …

KimMinSang96 updated 1 month ago
12
NVIDIA/TensorRT-LLM #2434

Error in data types: using model with lora

### System Info a100 ### Who can help? @byshiue @juney-nvidia ### Information - [ ] The official example scripts - [x] My own modified scripts ### Tasks - [x] An officially supported task in th…

Alireza3242 updated 3 weeks ago
1
triton-inference-server/server #7643

Make State Tensor Stay in Device Memory

Hi guys From: https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/instance_state.cc#L1148 I noticed that when processing the state tensor, Triton will copy the state tensor…

poor1017 updated 2 months ago
1
marcoslucianops/DeepStream-Yolo #547

Yolov5: ERROR: Failed to get cuda engine from custom library…

• Hardware Platform (Jetson / GPU) Jetson nano Devkit • DeepStream Version 6.0.0 • JetPack Version (valid for Jetson only) 4.6 • TensorRT Version 8.2.1.8 I have an script running on Jetson Xavie…

flmello updated 6 days ago
1
qubvel-org/segmentation_models.pytorch #964

Model preservation and transformation problems(模型保存和转换的问题)

How do I save the model trained with the example as a pt file or something else and convert it to an onnx model。（请问如何将用示例训练好的模型保存为pt文件或者其他，并且转换为onnx模型？？）

CvBokchoy updated 2 weeks ago
7
NVIDIA/TensorRT-LLM #2357

openai_server error

System Info GPU： NVIDIA RTX 4090 TensorRT-LLM 0.13 quest 1: How can I use the OpenAPI to perform inference on a TensorRT engine model? root@docker-desktop:/llm/tensorrt-llm-0.13.0/examples/apps# pyt…

imilli updated 1 week ago
1
microsoft/onnxruntime #22664

[Performance] Model runtime spiky with TensorRT Execution Pr…

### Describe the issue Below is the best configuration I could find to get the model running as fast as possible on Jetson ORIN using TensorRT + Onnxruntime backend ``` session_options.SetIntraO…

ashwin-999 updated 1 week ago
1
onnx/models #461

Postprocessing - Ultraface within TensorRT Inference

Hello Everyone, i wrote a Inference within Nvidias TensorRT and got Predictions from my model. However i dont know how to properly postprocess the predictions to get and draw the right bboxes. I …

tsedlmeier updated 3 years ago
1
michaelfeil/infinity #372

when use engine optimum device tensorrt，startup fail

### System Info infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32 OS: linux model_base PEG nvidia-smi: cuda version …

weibingo updated 2 months ago
5
triton-inference-server/server #7594

GPU memory is not released by Triton

**Description** Triton does not clear or release GPU memory when there is a pause in inference. In the attached diagrams the same model is being used. It is served via ONNX. ![image (1)](https:…

briedel updated 1 week ago
15

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for tensorrt-inference

1000+ results
for tensorrt-inference