speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #8116

[Bug]: Loading GPTQ-quantized GPTBigCode fails in weight_loa…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTor…

maxdebayser updated 5 hours ago
5
vllm-project/vllm #4577

[Bug]: Special tokens split when decoding after 0.4.0.post1

### Your current environment ```text PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS (x86_64) GCC ve…

DreamGenX updated 1 month ago
2
vllm-project/vllm #4569

[CI][Contribution Welcomed] Conditional Testing

### Anything you want to discuss about vllm. Currently we run all CI tests matrix on every single commit in pull requests. The CI cost of the vLLM has been doubling each week as we add more tests a…

simon-mo updated 2 months ago
11
huggingface/transformers #27640

Allow passing 2D attention mask

### Feature request Allow passing a 2D attention mask in `model.forward`. ### Motivation With this feature, it would be much easier to avoid cross-context contamination during pretraining and super…

UniverseFly updated 2 weeks ago
11
irthomasthomas/undecidability #641

Guide to choosing quants and engines : r/LocalLLaMA

- [ ] [Guide to choosing quants and engines : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1anb2fz/comment/kprbduc/) # Guide to choosing quants and engines : r/LocalLLaMA **DESCRIPTIO…

irthomasthomas updated 6 months ago
1
huggingface/transformers #27712

Add support for llama.cpp

### Feature request I would like to request [llama.cpp](https://github.com/ggerganov/llama.cpp) as a new model backend in the transformers library. ### Motivation llama.cpp offers: 1) Exce…

oobabooga updated 2 weeks ago
16
dart-lang/sdk #25377

UTF8 encoding is slow

The following program encodes that same ASCII string using a naive approach and using actual `UTF8.encode()`. The naive approach is about `3 times` faster. Could UTF8 be optimized to provide better pe…

scheglov updated 4 months ago
7
vllm-project/vllm #6038

[Bug]: Speculative decoding does not respect per-request see…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

tdoublep updated 1 month ago
2
SYSTRAN/faster-whisper #771

Speculative Decoding

1. Is speculative decoding faster than faster-whisper? 2. Is there going to be support anytime soon for speculative decoding in faster-whisper? Both of these questions are asked with a purely…

RohitMidha23 updated 2 months ago
3
InternLM/InternLM #793

[Bug] internlm2_5-7b-chat-4bit 无法使用vllm加速推理

### 描述该错误 [ModelCloud/internlm-2.5-7b-chat-gptq-4bit](https://huggingface.co/ModelCloud/internlm-2.5-7b-chat-gptq-4bit) and my code: ``` from vllm import LLM, SamplingParams # Sample prompts. p…

soulzzz updated 5 days ago
1

上一页 1...19 20 21 22 23 24 25...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding