triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/pytriton #75

multi-gpu inference with pytriton got worse TPS

**Description** I implemented multi-instance inference across 4 A100 GPUS by following [this](https://triton-inference-server.github.io/pytriton/latest/binding_models/#multi-instance-model-inferenc…

lionsheep24 updated 1 day ago
4
triton-inference-server/tensorrtllm_backend #243

launch multi-gpu triton server Got Port already in use

when I launch multi-gpu triton server `python scripts/launch_triton_server.py --world_size 4 --model_repo /path/to/model/repo ` Got port in use error 21 09:27:15.346696872 166 chttp2_s…

yjjiang11 updated 8 months ago
8
NVIDIA/k8s-device-plugin #430

0/1 nodes are available: 1 Insufficient nvidia.com/gpu. pree…

``` root@ttogpu:~# kubectl describe pod triton-inference-server-5b6c7f889c-f54c6 Name: triton-inference-server-5b6c7f889c-f54c6 Namespace: default Priority: 0 Service …

Todoroki02 updated 6 months ago
1
shimat/opencvsharp #1572

Converting Mat to byte[] to send to nvidia triton server

# Summary of your issue I want to convert OpenCvSharp Mat object to a byte[] that maintains it's size. To clarify what I mean is if I'd have an image of width and height 640, I want to receive a by…

SCraenen updated 9 months ago
1
triton-inference-server/server #6496

Triton Server "Compute Output" time is too high

**Description** I am trying to deploy GLIP transformer model using python backend with custom python conda environments in Triton using GPU. My inference time is as expected but the output computatio…

vatsalg29 updated 10 months ago
1
bytedance/lightseq #414

Does LightSeq support ONNX export and Triton Inference Serve…

Hi team, QQ: does `lightseq` support the followings, - Convert HuggingFace BERT/RoBERTa models to `int8` precision directly - If yes, can the converted model be exported to ONNX format directly? - …

stevezheng23 updated 1 year ago
1
allegroai/clearml-serving #43

Deploying Models from Azure Blob

Models which are located on the clearML servers (created by Task.init(..., output_uri=True) ) run perfectly while models which are located on azure blob storage produce different problems in different…

ockaro updated 1 year ago
5
triton-inference-server/tensorrtllm_backend #100

Triton server no response when setting end_id in request

I used a fine-tuned llama2 model and built it with awq-int4, tp_size=4 max_input_length=8000, max_output_length=8000with tensorrt-llm. The model runs perfectly under tensorrt-llm. When I use Trito…

CaesarWWK updated 9 months ago
3
bioimage-io/bioimageio-colab #3

SAM and the bioengine / bioimageio-colab

## Running SAM in the Modelzoo Universe We have started with some efforts on integrating SAM with the bioengine / imjoy / bioimageio-colab. I want to summarize here the overall goals, the current …

constantinpape updated 4 months ago
6
triton-lang/triton #4390

RuntimeError: Triton Error [CUDA]: device kernel image is in…

Hello everyone, I encountered an error message (as shown below) while trying to run the Mamba model (code below). Experimental environment: Cuda11.8 + Pytorch2.0.0 + Triton=2.2.0 What should…

MstarLioning updated 1 month ago
1

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for triton-server

1000+ results
for triton-server