tensorrt-inference-server Search Results

1000+ results
for tensorrt-inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #164

request was blocked when gpt_model_type=inflight_fused_batch…

Hello, I am currently experiencing an issue with the `triton-inference-server/tensorrt_backend` while trying to run a Baichuan model. ### Description I have set `gpt_model_type=inflight_fused…

burling updated 2 weeks ago
4
NVIDIA/TensorRT-LLM #2052

LLAMA 3.1 8B Quantization failed from BF16 to FP8

### System Info GPU: NVIDIA T4 * 4 Driver Version: 550.54.15 CUDA: 12.4 Image: nvcr.io/nvidia/tritonserver:24.07-trtllm-python-py3 TensorRT-LLM version: 0.11.0 ### Who can help? No response…

Ryan-ZL-Lin updated 1 week ago
14
triton-inference-server/tensorrtllm_backend #582

Metrics "nv_inference_request_failure" value is always 0 eve…

### System Info - CPU Architecture x86_64 - GPU - A100-80GB - CUDA version - 11 - Tensorrt LLM version : 0.9.0 - Triton server version - 2.46.0 - model : Llama3-7b ### Who can help? _No respo…

ajagetia2001 updated 1 week ago
2
NVIDIA/TensorRT-LLM #1950

LLAMA checkpoint ImportError: undefined symbol

### System Info - Architecture: x86_64 - OS Ubuntu 22.04 - GPU: NVIDIA GeForce RTX 4090 - Gpu memory 2x24gb - CPU max MHz: 5000.0000 - Driver Version: 535.183.01 - CUDA Version: 12.2 - Conta…

Pareek-Yash updated 1 month ago
5
NVIDIA/TensorRT-LLM #2004

Not found: unable to load shared library: libtensorrt_llm.so…

Hello, I want to deploy llama-3-8b quantized model using tritonserver I followed below steps to do this: 1. create container with nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3 base image. 3.…

nikhilcms updated 1 month ago
10
ROCm/AMDMIGraphX #2411

MIGraphX execution provider for Triton Inference Server

Can this be done by leveraging the onnxruntime work we already have as a back end? As a preliminary step, learn to add a Cuda back end, then change it to MIGraphX/ROCm See [https://github.com…

bpickrel updated 2 months ago
65
triton-inference-server/server #7580

I don't know what to do.

**Description** A clear and concise description of what the bug is. ![output_image](https://github.com/user-attachments/assets/bed4e808-a3e0-4225-96c4-04ae69c65a15) **Triton Information** …

choi119 updated 1 week ago
6
triton-inference-server/tensorrtllm_backend #440

Deployement failed for BERT

I have a bert model that I am trying to deploy with Triton Inference Server using Tensorrt-LLM backend. But I am getting errors: ? Docker Image: 24.03 ? TensorRT-LLM: v0.8.0 Error: +-------+-…

vivekjoshi556 updated 4 months ago
1
ultralytics/ultralytics #14953

multi-threading or list.streams

@Rasantis hey! Absolutely, YOLOv8 is designed with efficiency in mind and supports processing multiple video streams in real-time, including RTSP streams. For handling +20 cameras, implementation…

shengyu27 updated 1 month ago
20
NVIDIA/TensorRT #4105

performance of concurrent with different module.

## Description I have two different module and convert to trt. when I run them in Serial. the cost time of only infer: ``` //10 times do_infer >> cost 400.60 msec. //warn-up do_infer >> cost 42.22 …

LightSun updated 5 days ago
7

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for tensorrt-inference-server

1000+ results
for tensorrt-inference-server