speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

PygmalionAI/aphrodite-engine #501

[Bug]: Cannot start GGUF FP16 models

### Your current environment ```Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubun…

Nero10578 updated 3 months ago
4
vllm-project/vllm #6700

[Bug]: vLLM 0.5.3 is getting stuck at LLAMA 3.1 405B FP8 mod…

### Your current environment ```text The output of `python collect_env.py` ``` ``` PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTo…

lanking520 updated 2 months ago
12
NVIDIA/nim-anywhere #15

RuntimeError: CUDA error: no kernel image is available for e…

I followed the documentation to build the LLaMA 3 8B Instruct model with multiple LoRA versions as described in this NVIDIA blog post(https://developer.nvidia.com/zh-cn/blog/deploy-multilingual-llms-w…

JIA-HONG-CHU updated 2 months ago
1
vllm-project/vllm #8270

[Usage]: Distributed inference with edge case: model fits mu…

### Your current environment ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

leszekhanusz updated 3 weeks ago
7
vllm-project/vllm #6775

[Bug]: Reproducing Llama 3.1 distributed inference from the …

### Your current environment ```text The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12…

eldarkurtic updated 2 months ago
4
VITA-MLLM/VITA #41

"RuntimeError: CUDA error: an illegal memory access was enco…

I have 4*80 machine, and change the “tensor_parallel_size = 4”, but the RuntimeError will happen, any idea to resolve the problem, should I change any other parameters? Thank you! > python3 -m we…

superobk updated 2 weeks ago
3
dottxt-ai/outlines #276

Add speculative decoding

I'm keen on adding [speculative decoding](https://arxiv.org/abs/2211.17192) to outlines. Is this something that is being worked on? Otherwise I would be happy to submit a PR but I'd need some advic…

SamDuffield updated 8 months ago
3
vllm-project/vllm #5983

[Bug]: TypeError: FlashAttentionMetadata.__init__() missing …

### Your current environment ```text The output of `python collect_env.py` ``` ### 🐛 Describe the bug run LLaVA-NeXT error： python -m vllm.entrypoints.openai.api_server --model /ai/LLaVA-NeX…

lonngxiang updated 3 months ago
6
SafeAILab/EAGLE #45

RuntimeError: probability tensor contains either `inf`, `nan…

Great work! I tried your [example](https://github.com/SafeAILab/EAGLE#:~:text=llama%2D2%2Dchat%5D-,With%20Code,-You%20can%20use) for llama-7b-chat and changed the tree structure in choices.py into …

cyLi-Tiger updated 6 months ago
5
Dao-AILab/flash-attention #660

Any plan for support paged attention?

First of all, thank you for the great work! Is there any plan to support paged kv cache in non-contiguous memory? For instance, in flash_attn_with_kvcache?

donglinz updated 3 months ago
13

上一页 1...64 65 66 67 68 69 70...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding