triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/server #4530

support decoupled mode in perf_analyzer

when using perf_analyzer to analyze a python decoupled model like [triton-decoupled](https://github.com/Jackiexiao/triton-decoupled-cache) using command below ``` perf_analyzer -i grpc --streami…

Jackiexiao updated 2 years ago
1
triton-inference-server/server #6997

Error generating stream: TextEncodeInput must be Union[TextI…

hi everyone i runing tritonserver vllm and i want runing with dynamic batching, but i encountered an error. It seems like it has something to do with my input Inference with curl: curl -X POST loca…

thanhtung901 updated 7 months ago
3
triton-inference-server/server #5961

Allow introspection and static analysis of `pb_utils` (Pytho…

**Is your feature request related to a problem? Please describe.** When writing the `model.py` file for a Python backend model, it is very difficult to correctly use `triton_python_backend_utils` (ak…

ClaytonJY updated 3 months ago
4
triton-inference-server/server #6301

GRPC Streaming, Stream on Client and One response from Serve…

**Is your feature request related to a problem? Please describe.** I aim to deploy my ASR model on a server that will receive audio packet bytes with each request. The server will then transcribe the…

AniForU updated 1 year ago
2
NVIDIA/DALI #4533

Inplace operator support

Hello, I wanted to ask whether it is possible to create in place operations. I have a pretty big DALI pipeline (in terms of image size) and I have to preprocess data, but each operation creates a copy…

appearancefnp updated 1 year ago
2
vllm-project/vllm #8978

[Usage]: Serving Llama 3.2 `llama-3-2-11b-vision-instruct` h…

### Your current environment ```text The output of `python collect_env.py` ``` ``` :128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', bu…

rchen19 updated 1 month ago
8
NVIDIA-AI-IOT/deepstream_parallel_inference_app #3

Error found when run the code

I just followed the steps of commands show in Readme： 1. apt install git-lfs git lfs install --skip-repo git clone https://github.com/NVIDIA-AI-IOT/deepstream_parallel_inference_app.git 2. apt-get…

lmw0320 updated 8 months ago
6
autopilotpattern/telegraf #2

MVP+1: RFD27 integration

[RFD27/Container Monitor](https://github.com/joyent/rfd/blob/master/rfd/0027/README.md) integration requires two things: 1. TLS certs based on a user's SSH key 2. Discovery of RFD27 endpoints ### Auth…

misterbisson updated 8 years ago
1
bytedance/lightseq #154

Can i use lightseq to speed up the model of fairseq Transfor…

Can i use lightseq to speed up the model of fairseq Transformer Decoder ? I already export the Transformer Decoder language model trained by fairseq , now i want to speed up the model by light seq …

ismymajia updated 3 years ago
3
NVIDIA/TensorRT-LLM #1833

Failed to run convert_checkpoint.py with int8 weight-only qu…

### System Info CPU Architecture: x86_64 CPU/Host memory size: 1024Gi (1.0Ti) GPU properties: GPU name: NVIDIA GeForce RTX 4090 GPU mem size: 24Gb…

frontword updated 4 months ago
10

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for triton-server

1000+ results
for triton-server