speculative-decoding Search Results

800 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #5288

[Bug] [Speculative Decoding/flash_attn]: Flash attn backend …

### Your current environment CI environment ### 🐛 Describe the bug See https://github.com/vllm-project/vllm/pull/5286 and https://github.com/vllm-project/vllm/issues/5152 My guess is the way we …

cadedaniel updated 3 weeks ago
1
predibase/lorax #57

Project Roadmap

WIP project roadmap for LoRAX. We'll continue to update this over time. # v0.10 - [ ] Speculative decoding adapters - [ ] AQLM # v0.11 - [ ] Prefix caching - [ ] BERT support - [ ] Embe…

tgaddair updated 1 month ago
32
irthomasthomas/undecidability #655

At the Intersection of LLMs and Kernels - Research Roundup

- [ ] [At the Intersection of LLMs and Kernels - Research Roundup](https://charlesfrye.github.io/programming/2023/11/10/llms-systems.html) # At the Intersection of LLMs and Kernels - Research Roundup…

irthomasthomas updated 4 months ago
1
vllm-project/vllm #4836

[Bug]: Running vllm docker image with neuron fails

### Your current environment root@9c92d584ab5f:/app# python3 ./collect_env.py Collecting environment information... WARNING 05-15 15:13:52 ray_utils.py:46] Failed to import Ray with ModuleNotFound…

yaronr updated 2 weeks ago
1
flexflow/FlexFlow #1364

Simplify BatchConfig

As we plan to move some states from the `BatchConfig` to the `RequestManager`, some fields in `BatchConfig` are rendered redundant. The following are the data members of the current `BatchConfig`. ``…

zikun-li updated 2 months ago
11
junhwi/next-gen-ai #22

24/04/28

Many-Shot In-Context Learning https://arxiv.org/abs/2404.11018 Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone https://arxiv.org/abs/2404.14219 https://github.com/apple…

junhwi updated 2 months ago
2
huggingface/distil-whisper #3

Compatibility with CTranslate2 / faster-whisper

Great work! I was wondering whether the distilled version might still be compatible with CTranslate2 / faster-whisper? I understand the changes to the decoder might require some changes there, not …

entn-at updated 7 months ago
8
NVIDIA/TensorRT-LLM #632

TensorRT-LLM Requests

Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on. Last update: `Jan 14th, 2024` 🚀 = in development #…

ncomly-nvidia updated 5 days ago
7
ggerganov/llama.cpp #7995

Feature Request: Support for Meta Chameleon 7B and 34B

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…

arch-btw updated 5 days ago
7
NVIDIA/TensorRT-LLM #169

Feature: Speculative sampling / Assisted Generation

An obvious feature to me, but also not one that is simple to implement - is speculative sampling on the road map? The idea would be using a second tiny-model combined with e.g. for greedy validatio…

michaelfeil updated 6 months ago
14

上一页 1...4 5 6 7 8 9 10...80 下一页

800 results for speculative-decoding

800 results
for speculative-decoding