inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Bin-Huang/chatbox #1505

[Feature] Multiple API endpoints, choosing of the model from…

**Problem Description** I have different ollama endpoints and I would like to choose from them. Right now I can only configure one. I run smaller models locally and larger models on inference server.…

jooray updated 3 weeks ago
1
InternLM/lmdeploy #2638

[Feature] Metrics Endpoint

### Motivation Is there is any endpoint within the API server where we are able to pull the metics like Running Requests Waiting Requests, Swapped Requests, GPU Cache Usage, CPU Cache Usage, Latency…

eldhosemjoy updated 2 weeks ago
1
vllm-project/vllm #9383

[Performance]: Maximizing the performance of batch inference…

### Misc discussion on performance Hi all, I'm having trouble with maximizing the performance of batch inference of big models on vllm 0.6.3 (Llama 3.1 70b, 405b, Mistral large) My command…

Hellisotherpeople updated 3 weeks ago
1
mlflow/mlflow #4190

[FR] gRPC inference server for pyfunc flavor

## Willingness to contribute - [ ] Yes. I can contribute this feature independently. - [x] Yes. I would be willing to contribute this feature with guidance from the MLflow community. - [ ] No. I ca…

tokoko updated 4 months ago
3
NVIDIA/NeMo-Aligner #351

serve_reward_model goes down

**Describe the bug** When we start `serve_reward_model.py` and run annotation, the server goes down during processing. It will crash on specific samples. These samples have a long context. [error.lo…

AtsunoriFujita updated 2 weeks ago
3
FunAudioLLM/SenseVoice #150

语音转文字报错

同一个视频，在windows是好的，ubuntu上报错 RROR:root:An error occurred: choose a window size 400 that is [2, 160] | 0/24 [00:00

hjj-lmx updated 3 weeks ago
1
roboflow/clip_video_app #3

Unable to start inference server

I get the folliwng error when starting inference server: TypeError: Invalid type for device_requests param: expected list but found Any pointers would be appreciated.

prasadvineetv updated 1 year ago
2
Helicone/helicone #2900

[Bug]: Predibase Support is Lacking

### What happened? Within the worker, we map to the predibase base URL. It says https://api.app.predibase.com but it should be https://serving.app.predibase.com Also, the model and usage are retur…

colegottdank updated 2 weeks ago
3
flexflow/FlexFlow #1491

Error when I use larger batch size for spec-infer

The spec-infer works well for batch size (1,2,4,8,16). But I change the batch size to 32, it turns out to be "stack smashing detected" ```+ ncpus=16 + ngpus=1 + fsize=30000 + zsize=60000 + max_se…

lhr-30 updated 1 month ago
1
mlcommons/cm4mlops #304

FBGEMM version mismatch on ARM

I was trying to run the DLRMv2 benchmark of MLPerf Inference on an ARM server using the instructions [here]( https://docs.mlcommons.org/inference/benchmarks/recommendation/dlrm-v2/#__tabbed_15_1). …

ayanchak1508 updated 1 week ago
13

上一页 1...9 10 11 12 13 14 15...100 下一页

1000+ results for inference-server

1000+ results
for inference-server