triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

sgl-project/sglang #1463

[Bug] oom,torch.OutOfMemoryError: seems to only use one gpu …

### Checklist - [X] 1. I have searched related issues but cannot get the expected help. - [ ] 2. The bug has not been fixed in the latest version. - [X] 3. Please note that if the bug-related iss…

chuangzhidan updated 9 hours ago
4
triton-inference-server/server #5959

BF16 support for integrated TensorRT precision mode

**Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] Feature request: BF16 support for inte…

BorisPolonsky updated 1 year ago
1
triton-inference-server/server #6923

Significant latency between COMPUTE_END and REQUEST_END

**Description** We're seeing significant latency in the order of 300-600 hundred milliseconds between COMPUTE_END and REQUEST_END on a TensorRT-LLM model. See OTEL trace image below. ![image](http…

danielchalef updated 1 month ago
2
triton-inference-server/tensorrtllm_backend #76

CUDA runtime error while pressure test

Inference failed while using ab with more than 3 concurrency but was ok with 1 or 2 concurrency. Using an A10G GPU, with Driver Version: 545.23.06，CUDA Version: 12.3， trt version：9.1， vicuna 13b-1.5-…

jiangshining updated 7 months ago
11
triton-inference-server/tensorrtllm_backend #381

Crashes for long context requests

trtllm crashes when I give long context requests within the `max-input-length` limits. I believe it happens when total pending requests reach the `max-num-tokens` limit. But why it's not queuing re…

Pernekhan updated 2 months ago
17
triton-inference-server/fastertransformer_backend #52

Can't run multi-node GPTJ inference

I followed the tutorial provided [here](https://github.com/triton-inference-server/fastertransformer_backend/blob/22dba92dc1cbd367d119520013ec365b313a63ba/docs/gptj_guide.md). I am able to run GPTJ-B …

BDHU updated 1 year ago
11
vegu-ai/talemate #17

Installation issues / clarifications [post here if you run i…

General catch all ticket for installation issues in this early stage of development.

vegu-ai updated 2 weeks ago
39
microsoft/onnxruntime #20038

Failed to allocated memory for requested buffer of size X

So I was trying to deploy a custom model on the tritonserver(23.08) with the onnxruntime_backend(onnxruntime version 1.15.1). But while doing so, we are facing this issue: ``` onnx runtime error 6: …

aaditya-srivathsan updated 2 weeks ago
8
NVIDIA/DALI #4808

Handling batch sizes with fn.experimental.inputs.video

### Describe the question. I want to crop each frame of a video. I currently have a pipeline that takes JPEG images of each frame and crops them, and I am trying to convert it to take the whole video…

joey-trigo updated 9 months ago
6
triton-inference-server/server #5779

python backend: cuDNN error: CUDNN_STATUS_MAPPING_ERROR and …

**Description** We use tritonserver with python backend to deploy a customized stable-diffusion model running on pytorch. We have discovered that our tritonserver has the non-determinist bug: 1. …

taoye114 updated 1 year ago
7

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for triton-server

1000+ results
for triton-server