triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

k2-fsa/sherpa #651

您好，我按照triton/whisper中的readme文件（trt-llm：0.14.0.dev2024091700）…

图一：报错信息；图二：报错位置，是WhisperEncoding官方源码部分。 trt-llm版本：0.14.0.dev2024091700 直接使用的nvcr.io/nvidia/tritonserver:24.07-py3镜像。

tianchengcheng-cn updated 1 month ago
3
triton-inference-server/server #6287

Premature shutdown of model during graceful shutdown

**Issue Description:** During a graceful shutdown of Triton Server, we've observed the following behavior: - Triton Server is hosting both Model A and Model B. - Model B can make calls to Model…

jsoto-gladia updated 8 months ago
4
triton-inference-server/server #7315

Memory over 100% with decoupled dali video model

**Description** Triton uses over 100% of physical memory and freezes the server when using a decoupled dali model with a long video input. **Triton Information** Docker `nvcr.io/nvidia/tritonserv…

wq9 updated 4 months ago
2
triton-inference-server/tensorrtllm_backend #198

Triton Server crashed when using baichuan2-13B bf16 precisio…

I'm trying to use Triton to deploy baichuan2-13B inference under bf16 precision. The tritonserver can be started successfully, but when processing client request, it crashed. - Use TensorRT-LLM v0…

Luis-xu updated 11 months ago
1
openmpf/openmpf #1170

TRTIS detection component may wait forever for a response fr…

There is the potential for the TRTIS detection component async thread to wait forever for a response from the server. This is a [known issue](https://github.com/NVIDIA/triton-inference-server/pull/176…

jrobble updated 2 weeks ago
1
vllm-project/vllm #8176

[Performance]: reproducing vLLM performance benchmark

### Proposal to improve performance _No response_ ### Report of performance regression _No response_ ### Misc discussion on performance To reproduce vLLM's performance benchmark, please…

KuntaiDu updated 1 month ago
7
triton-inference-server/server #7374

Prebuilt Triton Server 24.05-trtllm-python-py3 does not have…

**Description** According to the Framework matrix (https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html#framework-matrix-2024), 24.05 is supposed to support TensorRT 10.0.6.1. Th…

CarterYancey updated 3 months ago
9
janhq/jan #3831

feat: Provider Extension - Triton Server

0xSage updated 1 week ago
1
huggingface/blog #972

How to package Hugging Face into Nvidia Triton Inference Ser…

I was recently deploying hugging face models on the Triton inference server which helped me to increase my GPU utilization and serve multiple models using a single GPU. Was not able to find good r…

nickaggarwal updated 7 months ago
25
opendatahub-io/modelmesh-serving #184

[P0] [SPIKE] - OOTB support for NVIDIA Triton Inference Serv…

From req doc: **OOTB support for NVidia Triton Inference Server** - We are going with OpenVINO right now as Triton can not be built right now due to maintenance concerns. Acceptance criteria: - Scope…

heyselbi updated 11 months ago
3

上一页 1...23 24 25 26 27 28 29...100 下一页

1000+ results for triton-server

1000+ results
for triton-server