inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PaddlePaddle/PaddleHub #1996

chinese_text_detection_db_server模型库，load_inference_model和sav…

https://github.com/PaddlePaddle/PaddleOCR/issues/7456 请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem - 系统环境/System Environment：Windows11 - 版本号/Version： P…

Zrincet updated 2 years ago
1
TheFoundryVisionmongers/nuke-ML-server #12

Dense Pose crashes

Server -> Receiving message of size: 24883378 Server -> 24883378 bytes read Server -> Message parsed Server -> Received inference request Server -> Requesting inference on model: densepose Server…

samhodge updated 5 years ago
3
YunchaoYang/Blogs #56

Serve LLM models

A few options to explore 1. NVIDIA NeMo, TensorRT_LLM, Triton - NeMo Run [this Generative AI example](https://github.com/NVIDIA/GenerativeAIExamples/tree/main/models/Gemma ) to build Lora wi…

YunchaoYang updated 2 months ago
7
InternLM/lmdeploy #2041

[Feature] Support logprob in VLM api server

### Motivation I found that the input token logprob is supported by Offline Inference Pipeline, as mentioned in [doc](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#calculate-lo…

cjfcsjt updated 4 months ago
1
SeldonIO/seldon-core #5096

seldon core v2: improved autoscaling

[slack conversation](https://seldondev.slack.com/archives/C03DQFTFXMX/p1692295520100029) What is the behavior of seldon core v2 in the following scenario? - A single server with HPA based on 50%…

kevinnowland updated 1 month ago
2
triton-inference-server/server #7052

Version with -1 makes the triton inference server - python …

**Description** A clear and concise description of what the bug is. r23.04 ``` I0718 11:39:24.385839 1 server.cc:653] | Model | Version | Status …

Kanupriyagoyal updated 7 months ago
2
mudler/LocalAI #3367

Can't start LocalAI (with REBUILD) on Xeon X5570 - Unwanted …

**LocalAI version:** Using Docker image: `localai/localai:latest-aio-gpu-hipblas` **Environment, CPU architecture, OS, and Version:** - Ubuntu 22.04 - Xeon X5570 [Specs](https://ark.intel.c…

chris-hatton updated 23 hours ago
4
triton-inference-server/server #7436

A fluctuating result is obtained when perf_analyze is run fo…

**Description** I used the latest image version 24.06 because the corresponding latest version of trt has support for BF16. But when I deploy the model with trt-backend. I used perf_analyze to pressu…

LinGeLin updated 3 months ago
2
cambrian-mllm/cambrian #25

Error in inference.py when multiple GPUs are available. [BUG…

My server has 8 GPUs and when running ``` python inference.py ``` It can load all models, but when input with image and question it raises an error with: RuntimeError: Expected all tensors to b…

ZeenSong updated 4 months ago
4
triton-inference-server/onnxruntime_backend #116

model with triton inference server is 3x slower than the mod…

**Description** I run the model on triton inference server and also on ORT directly. Inference time on triton inference server is 3 ms, but it is 1 ms on ORT. In addition, there isn't any communicati…

farzanehnakhaee70 updated 1 year ago
3

上一页 1...38 39 40 41 42 43 44...100 下一页

1000+ results for inference-server

1000+ results
for inference-server