llm-serving Search Results

1000+ results
for llm-serving

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #9535

[Performance]: bitsandbytes quantization slow

### Proposal to improve performance Improve bitsandbytes quantization inference speed ### Report of performance regression I'm testing llama-3.2-1b on a toy dataset. For offline inference using the…

lance0108 updated 1 month ago
8
AgentOps-AI/agentops #383

LLM call Latency increased due to agentops

## 🐛 Bug Report **🔎 Describe the Bug** Give a clear and concise description of the bug. I have a fastapi uvicorn server which serves multiple concurrent requests. In each of the call, I am using …

rupav updated 2 months ago
2
intel-analytics/ipex-llm #12146

Glm4-9b-inference输出错误ISSUE

用以下方式验证glm4-9b-chat模型的输出，serving端报错 curl --request POST \ --url http://127.0.0.1:8000/v1/chat/completions \ --header 'content-type: application/json' \ --data '{ "model": "glm-4-9…

jessie-zhao updated 2 months ago
1
ray-project/llmperf-leaderboard #12

server spec for ray llm serving

Hi there, I am wondering what hardware does ray use for serving in this llmperf leaderboard. Is it cpu or gpu? if it is GPU what's the model? Thanks, Fizzbb

Fizzbb updated 9 months ago
1
vllm-project/vllm #4763

[Feature]: Support W4A8KV4 Quantization(QServe/QoQ)

### 🚀 The feature, motivation and pitch This library https://github.com/mit-han-lab/qserve , introduces a number of innovations. More importantly is the W4A8KV4 Quantization, called on the paper (htt…

bratao updated 2 months ago
7
explodinggradients/ragas #1188

Integrating third-party LLMs for Evaluating Chinese-native R…

Hi there, Thank you for bringing the elegant RAG Assessment framework to the community. I am an AI engineer from Alibaba Cloud, and our team has been fine-tuning LLM-as-a-Judge models based on t…

hurenjun updated 2 months ago
8
tidyverse/elmer #140

Consider support for vllm-hosted models?

Hi @hadley, thanks for sharing this, really exciting. Very nice to see support for open models via ollama. I wonder if you would consider adding support for VLLM-hosted models as well, e.g. see ht…

cboettig updated 3 weeks ago
10
letta-ai/letta #2071

Unpacking inner thoughts from more than one tool call (2) is…

**Is your feature request related to a problem? Please describe.** Hello. I tried to use letta with vllm serving qwen2.5 72B model. It returned 2 tools and letta doesn't support this ``` Response …

victorserbu2709 updated 1 week ago
2
vllm-project/vllm #7625

[Bug]: The error is caused by: RuntimeError: out must have s…

### Your current environment These is the 0.5.0 environments ### 🐛 Describe the bug **1、These log files:** Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-package…

zjjznw123 updated 1 week ago
3
Future-House/paper-qa #645

Some questions about LLM settings?

1. How many LLMs are needed for `setting`? In your paper [PaperQA: Retrieval-Augmented Generative Agent for Scientific Research](https://arxiv.org/pdf/2312.07559.pdf), this paper seems to have employi…

bwnjnOEI updated 1 month ago
1

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for llm-serving

1000+ results
for llm-serving