speculative-decoding Search Results

800 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

outlines-dev/outlines #845

Handle Medusa speculative decoding also with outline

### What behavior of the library made you think about the improvement? As of now Medusa is generating hallucinations as the speculative multihead is not supporting the outline decoding grammar. …

jqueguiner updated 1 month ago
1
vllm-project/vllm #5016

[Feature] [Spec decode]: Combine chunked prefill with specul…

### 🚀 The feature, motivation and pitch Speculative decoding can achieve 50%+ latency reduction, but in vLLM it can suffer from the throughput-optimized default scheduling strategy where prefills are…

cadedaniel updated 3 weeks ago
2
triton-inference-server/tensorrtllm_backend #224

speculative decoding

how to use speculative decoding? is there any document for understanding it better? added support in recent update for both tensorRT llm and TensorRT llm backend

MrD005 updated 6 months ago
1
vllm-project/vllm #3960

[Feature]: Tree attention about Speculative Decoding

### 🚀 The feature, motivation and pitch I want to implement tree attention for vllm mentioned in [RoadMap](https://github.com/vllm-project/vllm/issues/3861). But I don’t know whether I should imple…

yukavio updated 2 months ago
4
intel-analytics/ipex-llm #10666

Self Speculative Decoding at lower precisions?

Hello there, I was wondering if it were possible to have the self-speculative decoding operate using IQ2 as the draft model and FP8 as the core model (as it has been shown that FP8 is very rarely …

ElliottDyson updated 2 months ago
4
ggerganov/llama.cpp #5877

Support speculative decoding in `server` example

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…

mscheong01 updated 2 months ago
3
irthomasthomas/undecidability #680

self-speculative-decoding/README.md at main · dilab-zju/self…

- [ ] [self-speculative-decoding/README.md at main · dilab-zju/self-speculative-decoding](https://github.com/dilab-zju/self-speculative-decoding/blob/main/README.md?plain=1) # Self-Speculative Decod…

irthomasthomas updated 3 months ago
1
turboderp/exllama #218

Speculative decoding?

https://github.com/dust-tt/llama-ssp Any plans to implement speculative decoding? Would probably improve latency by at least 2x and seems not too difficult to implement.

bryanhpchiang updated 10 months ago
17
OpenNMT/CTranslate2 #1474

Support Speculative Decoding

This could be used for LLMs and hopefully for encoder-decoder models like using the smaller NLLB coupled with the bigger NLLB models

JOHW85 updated 7 months ago
5
OptimalScale/LMFlow #680

Experiments for speculative_decoding

"We tested the speculative inference using the first 100 inputs from alpaca test dataset as prompts. When model=gpt2-xl, draft_model=gpt2". I want to test speedup for my own model and draft_model. …

taoxunqiang updated 5 months ago
3

上一页 1...1 2 3 4 5 6 7...80 下一页

800 results for speculative-decoding

800 results
for speculative-decoding