inference-server Search Results

1000+ results
for inference-server

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

opea-project/GenAIExamples #815

[Bug] TEI Gaudi 2 image is failing to launch

### Priority Undecided ### OS type Ubuntu ### Hardware type Gaudi2 ### Installation method - [X] Pull docker images from hub.docker.com - [ ] Build docker images from source ### Deploy method …

ezelanza updated 2 weeks ago
6
WasmEdge/WasmEdge #3504

LFX Workspace: Create a search-enabled API server for local …

## Motivation WasmEdge is a lightweight inference runtime for AI and LLM applications. The [LlamaEdge project](https://github.com/LlamaEdge) has developed an [OpenAI-compatible API server](https://gi…

suryyyansh updated 2 months ago
27
onnx/tutorials #200

ONNXRuntime Server in windows

Hi, Can someone please help me How can I build and use the OnnxRuntime Server in windows which can support gRPC and HTTP. I have made a c++ api which take a image as input and uses onnx model for I…

anirudha16101 updated 4 years ago
3
modelscope/ms-swift #1295

自定义评测数据集做评测时出现，模型用vllm.entrypoints.openai.api_server起的。运行评测脚…

模型启动方法：python -m vllm.entrypoints.openai.api_server --served-model-name qwen2-7b-instruct --model /app/Qwen2-7B-Instruct --gpu-memory-utilization 0.9 评测方法：swift eval --eval_url http://127.0.0.1:8000/…

Wolverhampton0 updated 2 months ago
1
huggingface/text-generation-inference #2321

Feature request: Add documentation and examples for adding a…

### Feature request I would like to be able to use guidelines or other libraries that support constrained output with HF endpoints. Reference: [A guidance language for controlling large language m…

michael-conrad updated 3 months ago
3
pytorch/serve #1356

Multi-GPU inference support

## Is your feature request related to a problem? Please describe. NVIDIA's Triton inference server provides a feature with which the user is able to load models in multiple GPUs for inference (Nivida…

AliJahan updated 2 years ago
1
vllm-project/vllm #2948

AWQ Quantization Memory Usage

Hello! First of all, great job with this inference engine! Thanks a lot for your work! Here's my issue: I have run vllm with both a mistral instruct model and it's AWQ quantized version. I've quant…

vcivan updated 2 weeks ago
5
vllm-project/vllm #9243

[Bug]: vllm0.6.2 Using FLASHINFER to start VLLM reported an…

Using FLASHINFER to start VLLM reported an error, enabling -- quantification gptq -- kv cache dtype fp8_e5m2 Start command: python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 78…

Rssevenyu updated 2 weeks ago
3
ELS-RD/transformer-deploy #166

Occasional "CUDA error cudaErrorInvalidConfiguration:invalid…

I have followed the instructions at https://github.com/ELS-RD/transformer-deploy/#feature-extraction--dense-embeddings to convert a sentence-transformers model (https://huggingface.co/sentence-transfo…

zoltan-fedor updated 10 months ago
1
triton-inference-server/onnxruntime_backend #251

Is onnxruntime-genai supported?

Hey all, I have a quick question, is onnxruntime-genai ([https://onnxruntime.ai/docs/genai/api/python.html](https://onnxruntime.ai/docs/genai/api/python.html)) supported in Triton Inference Server's O…

jackylu0124 updated 6 months ago
1

上一页 1...62 63 64 65 66 67 68...100 下一页

1000+ results for inference-server

1000+ results
for inference-server