speculative-decoding Search Results

800 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #498

speculative decoding performance

I've tested speculative decoding feature using llama3 models; I convert draft/target model to trt engine, and launch triton server with bls model, but there seems no performance gain. environment s…

biaochen updated 6 hours ago
3
InternLM/lmdeploy #1738

[Feature] Speculative Decoding

### Motivation Speculative decoding can speed up generation more than 2x. This degree of speedup is an important feature for a production-grade LM deployment library, and it seems the methods are s…

josephrocca updated 2 weeks ago
5
vllm-project/vllm #4630

[Speculative decoding] [Help wanted] [Performance] Optimize …

### Proposal to improve performance With the end-to-end correctness tests merged in https://github.com/vllm-project/vllm/pull/3951, now we will optimize the implementation to get ~50% speedup on 70…

cadedaniel updated 2 weeks ago
12
huggingface/transformers #31183

Speculative Decoding for chunked audios

### System Info Transformers Version: 4.42.0 Python environment: 3.10.14 ### Who can help? @sanchit-gandhi ### Information - [ ] The official example scripts - [X] My own modified scripts ### …

devansh-shah-11 updated 3 weeks ago
2
vllm-project/vllm #4565

[RFC]: Automate Speculative Decoding

### Motivation. Speculative Decoding is a crucial feature for reducing latency, currently supported by vLLM (credit to @cadedaniel !). However, when deploying Speculative Decoding in real online LL…

LiuXiaoxuanPKU updated 1 week ago
8
ita9naiwa/attention-impl #8

Speculative Decoding

ita9naiwa updated 2 months ago
1
jafioti/luminal #61

[feature suggestion] self speculative decoding

Good morning(or afternoon/ evening)! There is a methodology called **self speculative decoding** among the techniques to enhance the speed of LLM inference. Would it be possible to implement this …

NewBornRustacean updated 3 weeks ago
7
vllm-project/vllm #5302

[Bug]: speculative decoding with max-num-seqs <= 2 * num-spe…

### Your current environment docker with vllm/vllm-openai:v0.4.3 （latest） ### 🐛 Describe the bug python3 -m vllm.entrypoints.openai.api_server --model ./Qwen1.5-72B-Chat/ --max-model-len 2400…

HappyLynn updated 3 weeks ago
1
pytorch-labs/gpt-fast #183

Question about the ENABLE_INTRA_NODE_COMM for speculative de…

Hi, I just tried to use this custom all reduce kernel for speculative decoding. I set ENABLE_INTRA_NODE_COMM=1. But I found the code will stuck after several iteration. Is there some bugs of this kern…

jianc99 updated 5 days ago
9
vllm-project/vllm #4212

[Speculative decoding] [Performance]: Re-enable bonus tokens

### Proposal to improve performance In https://github.com/vllm-project/vllm/pull/3951 we disable bonus tokens (token sampled from verifier model assuming all proposal tokens are accepted) because i…

cadedaniel updated 3 days ago
8

上一页 1...1 2 3 4 5 6 7...80 下一页

800 results for speculative-decoding

800 results
for speculative-decoding