triton-server Search Results

1000+ results
for triton-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/TensorRT #3248

🐛 [Bug] Error when serving Torch-TensorRT JIT model to Nvidi…

## Bug Description I'm trying to serve torch-tensorrt optimized model to Nvidia Triton server based on the provided tutorial https://pytorch.org/TensorRT/tutorials/serving_torch_tensorrt_with_t…

zmy1116 updated 2 weeks ago
3
ovh/public-cloud-roadmap #227

Triton Inference Server App

## User story As a customer, I want to launch an app implementing Triton Inference Server In order to deploy my models in production with optimisation and high availability. ## Acceptance …

mhrng updated 1 year ago
1
migraphx-benchmark/AMDMIGraphX #178

MIGraphX as backend for Triton Inference Server

The idea here is to use the Triton Inference Server to perform Inferences via MIGraphX. The first issue to tackle is to enable it without the official docker, and use a rocm based. The next would be…

attila-dusnoki-htec updated 5 months ago
4
triton-inference-server/server #7678

are FP8 models supported in Triton ??

We have an encoder based model, and we have currently deployed in FP16 mode in production and we want to reduce the latecny further. Does triton support FP8 ? In the datatypes documentation here: …

jayakommuru updated 1 month ago
7
microsoft/onnxruntime #22764

[Feature Request] Jagged batches support (NJT) for Transform…

### Describe the feature request PyTorch / HF (previously branded as BetterTransformer) now have some support for NJT representation: - https://github.com/onnx/onnx/issues/6525 This allows to have e…

vadimkantorov updated 1 day ago
4
triton-inference-server/tensorrtllm_backend #601

Qwen2-14B inference garbled

### System Info When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the out…

kazyun updated 1 week ago
2
triton-inference-server/server #7038

Triton Inference Server outage

**Description** The Triton Inference server is deployed on the only CPU device. There are about 32 models (onnxruntime). The Triton Inference server outage during the long load testing. It stops …

tatsianaDr updated 7 months ago
2
triton-inference-server/server #7601

High GPU memory when load model use transformers

**Description** If I loaded 2 model transformer and inference model, memory GPU used about 3Gi. ``` PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command 2207044 coreai 0 C…

TheNha updated 2 months ago
2
triton-inference-server/model_analyzer #901

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in …

When I used model-analyzer, I got "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte". I have the same problem with the latest tag:24.05-py3-sdk. Why do I …

kyosukegg updated 1 month ago
7
triton-inference-server/server #7066

Model loaded via `model repository` api does not appear afte…

**Description** I've loaded a model via `v2/repository/models/simple/load` endpoint. But when querying `v2/repository/index` endpoint I get a `[]` as a responce. **Triton Information** What ver…

ogvalt updated 1 week ago
20

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for triton-server

1000+ results
for triton-server