speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #5023

[Bug]: Mistral 7b inst v0.3 fails to run

### Your current environment Using official Docker image. ### 🐛 Describe the bug Using Docker image: vllm/vllm-openai:latest Params: ``` --model=mistralai/Mistral-7B-Instruct-v0.3 --gpu-memo…

yaronr updated 3 months ago
1
ggerganov/llama.cpp #3278

Contrastive Decoding Improves Reasoning in Large Language Mo…

This paper has a method similar to speculative sampling that improves models by sampling the lower quality model for tokens to avoid thus increasing the quality of the output of the higher quality mod…

logikstate updated 5 months ago
11
ggerganov/llama.cpp #5620

Implementation of speculative streaming

This might be of interest : https://huggingface.co/papers/2402.11131

NickNickGo updated 5 months ago
2
mlc-ai/mlc-llm #1357

Add Attention Sinks

## 🚀 Feature Add Attention Sinks (https://arxiv.org/pdf/2309.17453.pdf, https://github.com/tomaarsen/attention_sinks/) to MLC. ## Motivation mlc_chat_cli gets noticeably slower as the conversatio…

kmn1024 updated 4 months ago
14
vllm-project/vllm #2794

Multi GPU ROCm6 issues, and workarounds

I ran into a series of issues trying to get VLLM stood up on a system with multiple MI210s. I figured I'd document my issues and workarounds so that someone could pick up the baton later, or at least …

BKitor updated 1 day ago
8
xusenlinzy/api-for-open-llm #293

qwen2推理报错

### 提交前必须检查以下项目 | The following items must be checked before submission - [X] 请确保使用的是仓库最新代码（git pull），一些问题已被解决和修复。 | Make sure you are using the latest code from the repository (git pull), some issue…

wj1017090777 updated 2 months ago
10
mozilla/multi-account-containers #2620

Cloudflare dashboard and Zero Trust "sub-dashboard" stuck in…

### Before submitting a bug report - [X] I updated to the latest version of Multi-Account Container and tested if I can reproduce the issue - [X] I searched for existing reports to see if it hasn't a…

gabfv updated 1 week ago
1
vllm-project/vllm #5061

[Usage]: not support for mistralai/Mistral-7B-Instruct-v0.3

### Your current environment vllm version: 0.4.2 ``` CUDA_VISIBLE_DEVICES=6 python -m vllm.entrypoints.openai.api_server \ > --model mistralai/Mistral-7B-Instruct-v0.3 \ > --dtype au…

yananchen1989 updated 3 months ago
4
xorbitsai/inference #2089

internlm2.5-7B-chat& internlm2.5-7B-chat-1M can't run in vll…

### System Info / 系統信息 Cuda:12.5 python:3.9 ubuntu22.04 ### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？ - [ ] docker / docker - [X] pip install / 通过 pip install 安装 - [ ] instal…

soulzzz updated 1 week ago
9
vllm-project/vllm #5082

curl http://localhost:8000/generate {"detail":"Not Found"}[U…

run with python -m vllm.entrypoints.openai.api_server --model vicuna-7b-v1.5 --trust-remote-code curl http://localhost:8000/generate -d '{"prompt": "Below is an instruction that describes a ta…

fishingcatgo updated 2 months ago
12

上一页 1...59 60 61 62 63 64 65...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding