triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

googleforgames/open-saves #58

Architecture and terminology

Here's the overall architecture of Triton: ![image](https://user-images.githubusercontent.com/166481/82379259-74854500-99db-11ea-9928-99370fb74d34.png) In scope: - Triton server - Client SDKs …

yuryu updated 4 years ago
2
NVIDIA/TensorRT-LLM #984

How to serve multiple TensorRT-LLM models in the same proces…

Hi there! I'm trying to serve multiple TensorRT-LLM models and I'm wondering what the recommended approach is. I'm using Python to serve TensorRT-LLM models. I've tried / considered: - `GenerationS…

cody-moveworks updated 2 months ago
6
kserve/modelmesh-serving #407

Documentation about GPU memory

Thank you very much for the incredible project! First of all, it would be very helpfull that you add a documentation on how to manage GPU memory while using Triton. I was doing several test but …

WaterKnight1998 updated 2 months ago
3
triton-inference-server/client #659

Add support for FetchContent or find_package

Neither find_package() nor FetchContent work out of the box for a standalone c++ cmake app. ### find_package Compile tritonclient manually and set CMAKE_PREFIX_PATH to the install folder. Altern…

philipp-schmidt updated 1 month ago
2
triton-inference-server/server #7079

How to access the triton inference server from outside of th…

Since the ingressroutes(https://github.com/triton-inference-server/server/blob/main/deploy/k8s-onprem/templates/ingressroute.yaml) has been deployed as LB to balance requests across all triton pods. H…

zengqingfu1442 updated 5 months ago
1
triton-inference-server/server #7018

Sending two "load" requests to server makes it load twice

**Description** When I use two clients to send `/v2/repository/models/MODEL/load` requests to the same server at the same time, the model is loaded twice **Triton Information** What version of Tr…

ShuaiShao93 updated 2 months ago
3
vllm-project/vllm #6155

[Usage]: How to use Multi-instance in Vllm? (Model replicati…

I would like to use techniques such as Multi-instance Support supported by the tensorrt-llm backend. In the documentation, I can see that multiple models are served using modes like Leader mode and …

KimMinSang96 updated 2 weeks ago
6
triton-inference-server/server #7035

RAM memory growth of triton server, until killed by OS

Im using nvcr.io/nvidia/tritonserver:23.10-py3 container for my inferencing, using C++ GRPC API. There is several models in container, Yolov8-like architecture in Tensorrt plus a few Torchscript model…

InfiniteLife updated 5 months ago
4
k2-fsa/k2 #1279

error: conflicting declaration ‘typedef struct CUevent_st*

Currently, I am trying to implement a custom k2 tritonserver backend, but i get this compilation error: ``` In file included from /usr/local/cuda/include/builtin_types.h:59, from /…

binhtranmcs updated 4 months ago
7
triton-inference-server/tensorrtllm_backend #275

Unable to launch triton server for 8 gpu mistral model

I'm trying to run inference with mistral 7b model on triton, however I am running into issues when I try to launch the server from my image. I suspect its an issue with some mpi and triton shared libr…

nikhilshandilya updated 8 months ago
1

上一页 1...15 16 17 18 19 20 21...100 下一页

1000+ results for triton-server

1000+ results
for triton-server