speculative-decoding Search Results

808 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ggerganov/llama.cpp #7995

Feature Request: Support for Meta Chameleon 7B and 34B

### Prerequisites - [X] I am running the latest code. Mention the version if possible as well. - [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.…

arch-btw updated 1 week ago
7
PygmalionAI/aphrodite-engine #515

[Bug]: New Numpy version breaks installation

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.10 (x86_64) GCC version: (…

murtaza-nasir updated 2 weeks ago
2
vllm-project/vllm #5814

[Bug]: Test_skip_speculation fails in distributed execution

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

wooyeonlee0 updated 5 days ago
7
NVIDIA/TensorRT-LLM #1480

Encountered an error in forward function: slice 712 exceeds …

### System Info GPU A30 * 2 TensorRT-LLM version: v0.9.0 Model: vicuna 13B ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My own modified scripts #…

sleepwalker2017 updated 1 month ago
5
Vaibhavs10/insanely-fast-whisper #96

[Benchmarking] Thorough benchmarking for Transformers!

I am starting this issue to do a more thorough benchmarking than the [notebooks](/notebooks) used in the repo. What should we measure: 1. Time for generation 2. Max GPU VRAM 3. Accuracy Hardw…

Vaibhavs10 updated 6 months ago
1
abetlen/llama-cpp-python #1203

[Implement Optimization] Skip Inference for Predefined Token…

**Problem** I need to create a lot of small JSONs with a LLM. To do so I started with [Jsonformer](https://github.com/1rgs/jsonformer). However, since this is not maintained anymore and my colleagu…

Garstig updated 3 months ago
3
huggingface/transformers #28981

tracker: `generate` compatibility with `torch.compile`

# `generate` 🤜 🤛 `torch.compile` This issue is a tracker of the compatibility between `.generate` and `torch.compile` ([intro docs by pytorch](https://pytorch.org/tutorials/intermediate/torch_comp…

gante updated 4 weeks ago
2
vllm-project/vllm #5313

[Speculative decoding]: The content generated by speculative…

### Your current environment ```text The output of `python collect_env.py` ``` Collecting environment information... PyTorch version: 2.1.2+cu121 Is debug build: False CUDA used to build PyTorc…

YuCheng-Qi updated 2 weeks ago
3
vllm-project/vllm #5886

[Feature]: Request for SmartSpec Method Support

### 🚀 The feature, motivation and pitch Recently, we read a paper where the vLLM team proposed a method called **SmartSpec**. We believe that the research, which dynamically adjusts the speculation …

bong-furiosa updated 1 day ago
2
feifeibear/LLMSpeculativeSampling #24

does it support batchsize > 1

does it support batchsize > 1 ?

GuoYi0 updated 3 months ago
2

上一页 1...5 6 7 8 9 10 11...81 下一页

808 results for speculative-decoding

808 results
for speculative-decoding