speculative-decoding Search Results

826 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

apoorvumang/prompt-lookup-decoding #3

Plans for paper or technical report

Apoorv, do you have plans for a paper or a technical report for prompt lookup decoding? I know you've indicated that people should cite your GitHub repo, but it would be nice to have something out …

shermansiu updated 5 months ago
5
vllm-project/vllm #6169

[Bug]: TypeError: 'NoneType' object is not callable when loa…

### Your current environment Idk how to run it inside a docker ### 🐛 Describe the bug Simply run the following command `docker run --runtime nvidia --gpus all -v ~/.cache/huggingface:/root/.ca…

DanielusG updated 2 days ago
10
vllm-project/vllm #5015

[Help wanted] [Spec decode]: Increase acceptance rate via Me…

### 🚀 The feature, motivation and pitch Speculative decoding allows emitting multiple tokens per sequence by speculating future tokens, scoring their likelihood using the LLM, and then accepting each…

cadedaniel updated 2 weeks ago
1
vllm-project/vllm #4632

[Performance] [Speculative decoding]: Support draft model on…

## Overview Speculative decoding allows a speedup for memory-bound LLMs by using a fast proposal method to propose tokens that are verified in a single forward pass by the larger LLM. Papers report 2…

cadedaniel updated 1 week ago
5
hao-ai-lab/LookaheadDecoding #44

Questions on combined attention mask structure for Jacobi it…

I have some questions about the structure of custom mask for lookahead and verify branches [as described in the blog](https://lmsys.org/blog/2023-11-21-lookahead-decoding/#lookahead-and-verify-in-the…

learning-chip updated 5 months ago
23
mlc-ai/mlc-llm #2319

[Feature Request] Medusa support

## 🚀 Feature Please add Medusa decoding in mlc-llm in C++, we urgently needed it to speedup LLM decoding on mobile device. refers to: https://github.com/FasterDecoding/Medusa/tree/main Medusa adds …

EmilioZhao updated 3 weeks ago
8
huggingface/transformers #27712

Add support for llama.cpp

### Feature request I would like to request [llama.cpp](https://github.com/ggerganov/llama.cpp) as a new model backend in the transformers library. ### Motivation llama.cpp offers: 1) Exce…

oobabooga updated 3 weeks ago
14
flexflow/FlexFlow #1130

Questions about the measurement of the latency

Hi FlexFlow team, I used the methods mentioned in #1099 to test the latency（GPU: RTX-4090）, but i get a confused result： 1）LLaMA-7B + 1个SSM(llama-160M), latency: 25.1 s 2）LLaMA-7B(without ssms), la…

ChuanhongLi updated 6 months ago
13
PygmalionAI/aphrodite-engine #494

[Usage]: OOM crash following Offline Inference setup

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS…

eedmond updated 1 month ago
3
PygmalionAI/aphrodite-engine #484

[Bug]: Moe's no longer working

### Your current environment **But this is my host env, the Engine is running on the official latest docker image.** ```text Collecting environment information... PyTorch version: N/A Is debug …

puppetm4st3r updated 1 month ago
3

上一页 1...12 13 14 15 16 17 18...83 下一页

826 results for speculative-decoding

826 results
for speculative-decoding