inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kserve/kserve #3750

Multi-Lora support

/kind feature **Describe the solution you'd like** [A clear and concise description of what you want to happen.] There are different directions: - extend existing API for referencing multiple …

skonto updated 2 weeks ago
1
mlflow/mlflow #13096

[BUG] Signature is not inferred and logged for certain input…

### Issues Policy acknowledgement - [X] I have read and agree to submit bug reports in accordance with the [issues policy](https://www.github.com/mlflow/mlflow/blob/master/ISSUE_POLICY.md) ### Where…

sharan21 updated 2 weeks ago
2
vllm-project/vllm #8439

[Usage]: why speculate decoding is slower than normal decod…

### Your current environment The startup command is as follows: it initiates both a standard 7B model and an n-gram speculate model. Speed tests discover that the speculate model performs more slowl…

yunll updated 1 week ago
5
triton-inference-server/server #7358

why is only 1st 'batch' inferred?

I have an ensemble model, model 1 output are 66 cropped images, model 1 is python, I manually resize/padded them to 3 batches with shape (30, 3, 48, 320), (30, 3, 48, 976), (6, 3, 48, 1280) (I …

mlfrd updated 2 weeks ago
2
vercel/ai #1338

examples/next-huggingface doesn't work.

### Description **Description:** When entering any entry in the chat, the request does not materialize and you get the errors "**failed to pipe response**" and "**ECONNRESET**". **Environment:** …

JDiegoRojas updated 1 week ago
2
huggingface/blog #972

How to package Hugging Face into Nvidia Triton Inference Ser…

I was recently deploying hugging face models on the Triton inference server which helped me to increase my GPU utilization and serve multiple models using a single GPU. Was not able to find good r…

nickaggarwal updated 6 months ago
25
gpustack/gpustack #283

Inference server exited with code 0

**Describe the bug** **Environment** - GPUStack version: v0.2.0 - OS: Ubuntu 22.04 - GPU: Nvidia P40, T4, H800 (all can reproduce this issue) **Steps to reproduce** 1. Install GP…

pengjiang80 updated 2 weeks ago
2
triton-inference-server/tensorrtllm_backend #198

Triton Server crashed when using baichuan2-13B bf16 precisio…

I'm trying to use Triton to deploy baichuan2-13B inference under bf16 precision. The tritonserver can be started successfully, but when processing client request, it crashed. - Use TensorRT-LLM v0…

Luis-xu updated 9 months ago
1
ultralytics/ultralytics #16315

Code Issue

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

john09282922 updated 2 weeks ago
4
NVIDIA/TensorRT-LLM #2101

Finding protobuf files while benchmarking TensorRT-LLM

### System Info I am working on the benchmarking suite in vLLM team, and now trying to run TensorRT-LLM for comparison. I am relying on this github repo (https://github.com/neuralmagic/tensorrt-demo)…

KuntaiDu updated 3 weeks ago
3

上一页 1...26 27 28 29 30 31 32...100 下一页

1000+ results for inference-server

1000+ results
for inference-server