speculative-decoding Search Results

800 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #791

Decoding Logic

Hi , Is there any way we can change the decoding logic? example : like support for speculative sampling or other

riyaj8888 updated 6 months ago
2
predibase/lorax #329

Want Lorax with newer version of TGI

### Feature request hello，our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。 We …

yangelaboy updated 2 months ago
5
ml-explore/mlx-examples #829

Proposal: Add mypy to .pre-commit-config.yml

So far, the.pre-commit-config.yml file only calls black and isort. I unintentionally used https://github.pie.apple.com/aiml-oss/ml-recurrent-drafter/blob/main/.pre-commit-config.yaml and discovered th…

wangkuiyi updated 2 weeks ago
2
vllm-project/vllm #1742

Lookahead decoding

They claim lookahead decoding provides a 1.5~2x decoding speedup without a speculative model. Blog post: https://lmsys.org/blog/2023-11-21-lookahead-decoding/ Twitter thread: https://twitter.com/l…

TheodoreGalanos updated 4 months ago
19
state-spaces/mamba #391

Speculative Decoding with Mamba 1

Hi, I am trying to implement the speculative decoding from [Accelerating Large Language Model Decoding with Speculative Sampling](https://arxiv.org/abs/2302.01318), and below is the code snippet: `…

adityakotha03 updated 2 weeks ago
1
InternLM/lmdeploy #1801

[Feature] Prefill/Decoding disaggregation substantially boos…

### Motivation **PD disaggregation** is said to provide an 2-4x throughput improvement. See [DistServe](https://hao-ai-lab.github.io/blogs/distserve/) for reference. They are planning to integrate th…

serser updated 2 days ago
9
vllm-project/vllm #4760

[Performance]: Why the avg. througput generation is low?

### Report of performance regression Hi I use this: ``` server_vllm.py \ --model "/data/models_temp/functionary-small-v2.4/" \ --served-model-name "functionary" \ --dtype=bfloat16 \ -…

rvsh2 updated 6 days ago
1
vllm-project/vllm #5809

[Feature]: MLPSpeculator Tensor Parallel support

### 🚀 The feature, motivation and pitch `MLPSpeculator`-based speculative decoding was recently added in https://github.com/vllm-project/vllm/pull/4947, but the initial integration only covers sing…

njhill updated 5 days ago
3
vllm-project/vllm #5167

[Performance]: What can we learn from OctoAI

OctoAI use vLLM as a benchmark to demonstrate how fast they are https://octo.ai/blog/acceleration-is-all-you-need-techniques-powering-octostacks-10x-performance-boost: | Single User Throughput | Mu…

hmellor updated 2 weeks ago
7
flexflow/FlexFlow #1374

Operators support for updated speculative inference design

## Related issues #1364 #1361 #1333 ## Description We proposed the inference implementation refactoring which mainly involves `Pipeline Split` and `Struct Smplification`, and this result some …

chenzhuofu updated 2 months ago
6

上一页 1...3 4 5 6 7 8 9...80 下一页

800 results for speculative-decoding

800 results
for speculative-decoding