speculative-decoding Search Results

800 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #2791

[Feature Request] Adding Eagle, Medusa, Look Ahead decoding …

Thanks for the great work team. I wonder if there is any plan to add new improvements to speculative decoding such as [Eagle](https://sites.google.com/view/eagle-llm), [Medusa](https://sites.google.co…

HamidShojanazeri updated 3 weeks ago
6
vllm-project/vllm #5561

[Performance] [Speculative decoding] Speed up autoregressive…

## Background Speculative decoding leverages the ability to cheaply generate proposals, and cheaply verify them to achieve speedup for memory-bound inference. Different methods of speculative decodin…

cadedaniel updated 1 week ago
3
vllm-project/vllm #4358

[Feature]: AssertionError: Speculative decoding not yet supp…

### 🚀 The feature, motivation and pitch Hi, Do you guys have any workaround for the `Speculative decoding not yet supported for RayGPU backend.` error or idea when the RayGPU backend will support …

cocoza4 updated 1 month ago
6
rustformers/llm #423

Medusa Speculative Decoding

Recently there was a project called Medusa which was released. It basically trains more `lm_head`'s that instead of predicting the next token, they predict the token n+2, n+3, and n+4 before generatin…

someone13574 updated 9 months ago
1
vllm-project/vllm #5342

[Speculative decoding]: `AttributeError: 'NoneType' object h…

### Your current environment vllm-0.4.3 ### 🐛 Describe the bug When I use the speculative mode and prompt_length+output_length > 2048, the error occurs When I use the speculative mode, I use th…

zhangxy1234 updated 2 weeks ago
3
huggingface/distil-whisper #11

[Speculative Decoding] How to run speculative decoding for b…

Transformers 4.35 only supports speculative decoding for batch size == 1. In order to use speculative decoding for batch size > 1, please make sure to use this branch: https://github.com/huggingface/t…

patrickvonplaten updated 7 months ago
2
vllm-project/vllm #5563

[Bug]: Speculative decoding server: `ValueError: could not b…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

jeffreyling updated 2 days ago
13
hemingkx/Spec-Bench #6

decoding tokens are not equal for different methods

Hi, thank you so much for your awesome work! I notice that when running ``equal.py`` to compare decoding tokens of speculative decoding methods (pld/eagle/hydra) with vanilla decoding tokens, the …

zfjsail updated 1 month ago
2
vllm-project/vllm #4303

[Feature]: batched parallel decoding

### 🚀 The feature, motivation and pitch [Parallel/Jacobi decoding](https://arxiv.org/abs/2305.10427) improves inference efficiency by breaking the sequential nature of conventional auto-regressive …

snyhlxde1 updated 3 weeks ago
7
vllm-project/vllm #3398

Sequoia: Scalable, Robust, and Hardware-aware Speculative De…

This paper might be of interest: https://arxiv.org/pdf/2402.12374.pdf

tchaton updated 3 months ago
1

上一页 1...1 2 3 4 5 6 7...80 下一页

800 results for speculative-decoding

800 results
for speculative-decoding