llm-serving Search Results

unslothai/unsloth #1309

Does tensorRT-LLM support serving 4bit quantised unsloth Lla…

We want to deploy https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit which is 4-bit quantized version of llama-3.2-1B model. It is quantized using bitsandbytes. Can we deploy this using ten…

jayakommuru updated 3 days ago

NVIDIA/TensorRT-LLM #2472

Does tensorRT-LLM support serving 4bit quantised unsloth Lla…

jayakommuru updated 4 days ago

openvinotoolkit/model_server #2760

LLM Serving Incompatible with OPENAI API: /v1/chat/completi…

**Describe the bug** OpenAI API endpoint is "/v1/chat/completions", but OVMS endpoint is "/v3/chat/completions". most of existing application doesn't allow user to modify the prefix “**V1**” to "**…

alexgang updated 1 day ago

elastic/kibana #199089

[LLM testing] Mock LLM for functional testing

For functional testing of features relying on LLM calls or LLM tasks, the main challenge is to be able to test with a stubbed LLM environment to be able to control the output of those calls or tasks. …

pgayvallet updated 2 weeks ago

drogonframework/drogon #2201

Overhead of creating new threads in Streaming Response

**Is your feature request related to a problem? Please describe.** There is an overhead of creating new threads when using streaming response feature. This drogon example demonstrates it very well: …

dkalinowski updated 2 days ago

vllm-project/vllm #6687

[Feature]: vTensor: Flexible Virtual Tensor Management for E…

### 🚀 The feature, motivation and pitch vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving look so cool ### Alternatives _No response_ ### Additional context _No response_

MichoChan updated 2 weeks ago

opea-project/GenAIComps #831

[RFC] OPEA Inference Microservices Integration for LangChain…

# OPEA Inference Microservices Integration for LangChain This RFC proposes the integration of OPEA inference microservices (from GenAIComps) into LangChain [extensible to other frameworks], enabli…

avinashkarani updated 2 weeks ago

zeitlings/alfred-ollama #2

Support for remote ollama

Hi @zeitlings, I love the workflow; I tried it on an M1 Macbook Air with 8 GB of RAM. As you can imagine, it completely sucks. I also have a server on which I sometimes play around with larger LLMs.…

priyanshs updated 1 day ago

anming81/image-mirror #3

intelanalytics/ipex-llm-serving-cpu:latest

anming81 updated 4 months ago

The-Trust-Assembly/trust-assembly #14

Research spike: determine cloud provider

How should we host the wiki + API backend? Choices I see right now: - Coolify self-hosting on a Hetzner remote machine - Render - Vercel - Fly.io - Others? We initially thought supabase cou…

adhurjaty updated 13 hours ago

1000+ results for llm-serving

1000+ results
for llm-serving