inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

elastic/elasticsearch #105518

[ML] Double deployment of trained model causes assertion err…

### Elasticsearch Version 8.14.0-SNAPSHOT ### Installed Plugins _No response_ ### Java Version JBR-17.0.9+8-1166.2-nomod ### OS Version 23.3.0 Darwin Kernel Version 23.3.0: Wed De…

maxhniebergall updated 1 month ago
5
triton-inference-server/server #7182

Is onnxruntime-genai supported?

Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…

jackylu0124 updated 1 week ago
3
triton-inference-server/server #7068

Docker images have repeated layers

**Problem: GKE image streaming will not work with these images due to repeated layers* I would like to use GKE image streaming with triton-inference-server images. This feature will only work if…

TheCodeWrangler updated 4 weeks ago
9
triton-inference-server/server #7075

Multi-instance TRT model slower than single-instance one. (G…

**Description** I noticed that a model with several instances is slower than with one. I believe that this should not be the case, but throughput and latency indicators say the opposite. **Triton …

decadance-dance updated 2 weeks ago
3
Nerogar/OneTrainer #446

[Bug]: Setting "train" to false for TEs does not freeze TEs …

### What happened? Configuring TEs as follows: ``` "text_encoder": { "train": false, "learning_rate": 2e-8, "layer_skip": 0, "weight_dtype": "FLOAT_32", "stop_trainin…

orcinus updated 1 day ago
11
huggingface/text-generation-inference #2456

Running TGI on NVIDIA T4

### System Info TGI from Docker text-generation-inference:2.2.0 host: Ubuntu 22.04 NVIDIA T4 (x1) nvidia-driver-545 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An o…

ivoras updated 1 month ago
3
triton-inference-server/onnxruntime_backend #265

Triton ONNX runtime backend slower than onnxruntime python c…

**Description** When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model usi…

Mitix-EPI updated 4 weeks ago
7
michaelfeil/infinity #372

when use engine optimum device tensorrt，startup fail

### System Info infinity_emb v2 --model_id /home/xxxx/peg_onnx --served-model-name embedding --engine optimum --device tensorrt --batch-size 32 OS: linux model_base PEG nvidia-smi: cuda version …

weibingo updated 1 month ago
5
triton-inference-server/server #7446

Is inferencing natively with C++ natively supported in Trito…

**Description** Hi, I have setup Triton version 2.47 for Windows, along with ONNX runtime backend, based on the assets for Triton 2.47 that are mentioned in this URL : https://github.com/triton-infer…

saugatapaul1010 updated 2 months ago
2
vllm-project/vllm #8826

Llama3.2 Vision Model: Guides and Issues

Running the server (using the vLLM CLI or our [docker image](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html)): * `vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eage…

simon-mo updated 1 week ago
50

上一页 1...19 20 21 22 23 24 25...100 下一页

1000+ results for inference-server

1000+ results
for inference-server