speculative-decoding Search Results

827 results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PygmalionAI/aphrodite-engine #484

[Bug]: Moe's no longer working

### Your current environment **But this is my host env, the Engine is running on the official latest docker image.** ```text Collecting environment information... PyTorch version: N/A Is debug …

puppetm4st3r updated 1 month ago
3
vllm-project/vllm #3620

[RFC] Initial Support for Cloud TPUs

# Progress - [x] Implement TPU executor that works on a single TPU chip (without tensor parallelism) #5292 - [ ] Support tensor parallelism for multiple chips in the same host #5871 - [ ] Suppo…

WoosukKwon updated 1 week ago
10
vllm-project/vllm #5540

[Feature]: LoRA support for Mixtral GPTQ and AWQ

### 🚀 The feature, motivation and pitch Please consider adding support for GPTQ and AWQ quantized Mixtral models. I guess that after #4012 it's technically possible. ### Alternatives _No r…

StrikerRUS updated 3 days ago
3
Dao-AILab/flash-attention #649

Support for Left Padding Mask KV?

Are there plans or a way to support left padding kv attention mask? I believe right padding can be supported with the mha_fwd_kvcache api with the seqlens_k_ pointer, but will there be a similar optio…

aciddelgado updated 6 months ago
3
MDK8888/GPTFast #14

Help to understand

Hi! I don't quite understand how this project works, I guess my main question is : `what is a draft model ? ` For example, I would like to speed-up the inference of OwlVit (https://huggingface.…

apirrone updated 3 months ago
6
vllm-project/vllm #5360

[Bug]: Multi GPU setup for VLLM in Openshift still does not …

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

jayteaftw updated 2 weeks ago
17
vllm-project/vllm #5809

[Feature]: MLPSpeculator Tensor Parallel support

### 🚀 The feature, motivation and pitch `MLPSpeculator`-based speculative decoding was recently added in https://github.com/vllm-project/vllm/pull/4947, but the initial integration only covers sing…

njhill updated 6 days ago
3
vllm-project/vllm #5596

[Bug]: GPTQ-Marlin kernel illegal memory access with `group_…

### Your current environment ```text Collecting environment information... /home/daniel/.pyenv/versions/vllm/lib/python3.11/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRAN…

danieldk updated 2 weeks ago
4
mlc-ai/mlc-llm #2350

[Question] mlc_llm serve fails with --speculative-mode, doe…

using nightly wheels. i can serve just fine with --speculative-mode disable, but all the other options give me this: ``` Exception in thread Thread-11 (_background_loop): Traceback (most recent …

0xDEADFED5 updated 1 month ago
2
vllm-project/vllm #5827

[Bug]: Internal Server Error when hosting Alibaba-NLP/gte-Qw…

### Your current environment Using latest available docker image: vllm/vllm-openai:v0.5.0.post1 ### 🐛 Describe the bug I am getting as response "Internal Server Error" when calling the /v1/embedd…

markkofler updated 1 week ago
1

上一页 1...13 14 15 16 17 18 19...83 下一页

827 results for speculative-decoding

827 results
for speculative-decoding