triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bytedance/lightseq #414

Does LightSeq support ONNX export and Triton Inference Serve…

Hi team, QQ: does `lightseq` support the followings, - Convert HuggingFace BERT/RoBERTa models to `int8` precision directly - If yes, can the converted model be exported to ONNX format directly? - …

stevezheng23 updated 1 year ago
1
triton-inference-server/tensorrtllm_backend #275

Unable to launch triton server for 8 gpu mistral model

I'm trying to run inference with mistral 7b model on triton, however I am running into issues when I try to launch the server from my image. I suspect its an issue with some mpi and triton shared libr…

nikhilshandilya updated 10 months ago
1
triton-inference-server/onnxruntime_backend #245

CPU Throttling when Deploying Triton with ONNX Backend on Ku…

**Description** I am deploying a YOLOv8 model for object-detection using Triton with ONNX backend on Kubernetes. I have experienced significant CPU throttling in the sidecar container ("queue-proxy")…

langong347 updated 5 months ago
6
triton-inference-server/tensorrtllm_backend #243

launch multi-gpu triton server Got Port already in use

when I launch multi-gpu triton server `python scripts/launch_triton_server.py --world_size 4 --model_repo /path/to/model/repo ` Got port in use error 21 09:27:15.346696872 166 chttp2_s…

yjjiang11 updated 10 months ago
8
vllm-project/vllm #9133

[Bug]: Unable to use --enable-lora on latest vllm docker con…

### Your current environment ```text podman --version podman version 5.2.3 uname -a Linux noelo-work 6.10.12-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 30 21:38:25 UTC 2024 x86_64 GNU/L…

noelo updated 2 weeks ago
1
kserve/modelmesh-runtime-adapter #80

Triton RuntimeStatus.MethodInfos is missing ModelStreamInfer

Triton provides an extension to the standard gRPC inference api for streaming (`inference.GRPCInferenceService/ModelStreamInfer`), this extension is required to use vLLM backend with triton. However …

Legion2 updated 9 months ago
1
triton-inference-server/tensorrtllm_backend #587

Error malloc(): unaligned tcache chunk detected Always Occur…

### System Info - Ubuntu 20.04 - NVIDIA A100 ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported …

wangpeilin updated 1 month ago
2
triton-inference-server/server #7144

Abnormal system memory usage while enabling GPU metrics

**Description** There is an abnormal system memory usage while enabling GPU metrics. enable GPU metrics: command: tritonserver --model-repository=/models **after a long time waiting** ![185854](…

SkyM31 updated 3 months ago
2
triton-inference-server/tensorrtllm_backend #100

Triton server no response when setting end_id in request

I used a fine-tuned llama2 model and built it with awq-int4, tp_size=4 max_input_length=8000, max_output_length=8000with tensorrt-llm. The model runs perfectly under tensorrt-llm. When I use Trito…

CaesarWWK updated 10 months ago
3
triton-inference-server/server #6177

Triton replication on Kubernetes, all traffic forwarded to t…

**Description** I deployed Triton Inference Server on Kubernetes (GKE). To balance the load, I created a Load Balancer Service. As a client, I'm using the Python HTTP client. I was expecting all the …

Vincouux updated 1 month ago
5

上一页 1...18 19 20 21 22 23 24...100 下一页

1000+ results for triton-server

1000+ results
for triton-server