triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #75

is support multi node in triton inference server?

is support multi node in triton inference server? i build llama-7b for tensorrtllm_backend and execute triton inference server i have a 4 GPUS but triton inference server load only 1 GPUS imag…

amazingkmy updated 10 months ago
4
triton-inference-server/tensorrtllm_backend #573

Inference server stalling

### System Info - tensorrtllm_backend built using Dockerfile.trt_llm_backend - main branch tesnorrt llm (0.13.0.dev20240813000) - 8xH100 SXM - Driver Version: 535.129.03 - CUDA Version: 12.5 …

siddhatiwari updated 1 month ago
2
NVIDIA/TensorRT-LLM #260

Triton server failed to start: out of memory

The engine is ok using python to run offline inference with trt-llm. But when I use triton to run it, it complains like following. Why is this? The triton server uses more memory than TRT-LLM of…

sleepwalker2017 updated 8 months ago
10
k2-fsa/sherpa #568

unable to launch Triton server on finetuned whisper model

Hi there, I have been finetuning whisper models using huggingface. Further to convert the model to TensorRT_LLM format, i use a HF script that converts the models from its HF format to the original …

StephennFernandes updated 5 months ago
6
triton-inference-server/server #7319

Triton Server 24.05 can't initialize CUDA drivers if host sy…

**Description** I was using Triton Server nvcr.io/nvidia/tritonserver:24.04-py3 on my local machine with Windows 10 via docker container. Ie installed latest Nvidia Driver 555.85, and docker containe…

romanvelichkin updated 3 months ago
2
songquanpeng/one-api #1215

如何调用Triton Inference Server的接口？

**例行检查** [//]: # (方框内删除已有的空格，填 x 号) + [] 我已确认目前没有类似 issue + [] 我已确认我已升级到最新版本 + [] 我已完整查看过项目 README，已确定现有版本无法满足需求 + [] 我理解并愿意跟进此 issue，协助测试和提供反馈 + [] 我理解并认可上述内容，并理解项目维护者精力有限，**不遵循规则的 issue 可能会被…

realcarlos updated 6 months ago
1
NVIDIA/FasterTransformer #398

triton fastertransformer server t5 beam search not working?

### Branch/Tag/Commit v5.2 ### Docker Image Version 22.08-py3 ### GPU name V100 ### CUDA Driver none ### Reproduced Steps ```shell use the fastertransformer triton backend …

gyin94 updated 1 year ago
11
NVIDIA/TensorRT-LLM #1938

problem with tensorrt_llm performance

### System Info hi, i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm. i did the following: - compile model with tensorrt llm c…

Arnold1 updated 2 weeks ago
4
triton-lang/triton #2898

[Feature Proposal] Adding triton-lsp-server for viewing

[MLIR LSP server](https://mlir.llvm.org/docs/Tools/MLIRLSP/) is a tool for IDE to understand `.mlir` files of various dialects. By integrating with `mlir-lsp` related tools, we can make IDE aware of t…

youkaichao updated 8 months ago
16
triton-inference-server/server #7394

Triton inference is slower than tensorRT

**Description** Im using a simple client inference class base on client example. My tensorRT inference with batchsize 10 with 150ms and my triton with tensorRT backend took 1100ms. This is my client:…

namogg updated 1 month ago
2

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for triton-server

1000+ results
for triton-server