inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

feast-dev/feast #4335

Add Open Inference Protocol to feature servers

**Is your feature request related to a problem? Please describe.** The goal of this feature is to simplify feast integration for model serving platforms. Feast feature servers have custom http/grpc i…

tokoko updated 2 months ago
3
huggingface/optimum-nvidia #69

Triton Inference Server

I would like to use this as a python backend within `triton-inference-server` in order to allow for bringing my production parameters in better alignment with training / validation. Are there plans…

TheCodeWrangler updated 6 months ago
2
natrys/whisper.el #26

[FR] Use `server` to make inference faster

whisper.cpp ships with a [server](https://github.com/ggerganov/whisper.cpp/tree/master/examples/server). Isn't using that faster than loading the model again for each request? Doing this should be …

NightMachinery updated 2 months ago
1
janhq/cortex.cpp #1342

idea: Better CLI -h grouping

### Problem Statement I can see why Gab was confused here: #1329 @vansangpfiev Can we use the groupings @dan-homebrew /I orginally suggested? 🙏 Current - Not super accurate bc `chat` shou…

0xSage updated 4 days ago
1
triton-inference-server/tensorrtllm_backend #601

Qwen2-14B inference garbled

### System Info When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the out…

kazyun updated 1 week ago
1
NVIDIA/TensorRT #4012

[New] Discord channel for triton-inference-server, tensorrt

Hi, I noticed there is no slack, discord or irc channel for tensorrt - which could offload some future tickets by discussing things in the channel - so I created one. I hope its ok to advertise …

geraldstanje updated 2 months ago
1
elastic/elasticsearch #112828

Inference request task timeout should not be 503

Timeouts on the inference service should not result in a 503. The rest suppressed logger reported this. Interestingly the timeout is 10s, while the default is not 30s ? ``` "error.stack_trace": "o…

ChrisHegarty updated 2 weeks ago
3
triton-inference-server/server #7650

Big performance drop when using ensemble model over separate…

**Description** We have an ensemble of 2 models chained together (description of models below). Calling only the "preprocessing" model yields a max throughput of 21500 QPS @ 6 Cpu cores usage Cal…

jcuquemelle updated 2 days ago
1
triton-inference-server/server #7038

Triton Inference Server outage

**Description** The Triton Inference server is deployed on the only CPU device. There are about 32 models (onnxruntime). The Triton Inference server outage during the long load testing. It stops …

tatsianaDr updated 6 months ago
2
triton-inference-server/tensorrtllm_backend #587

Error malloc(): unaligned tcache chunk detected Always Occur…

### System Info - Ubuntu 20.04 - NVIDIA A100 ### Who can help? @kaiyux ### Information - [X] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported …

wangpeilin updated 3 weeks ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for inference-server

1000+ results
for inference-server