speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #9363

[Bug]: Qwen2-VL-72B Inference on Multiple-GPUs

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A…

bhupendra1324 updated 1 week ago
6
vllm-project/vllm #7240

[Bug]: The new version (v0.5.4) cannot load the gptq model, …

### Your current environment ```text --2024-08-07 03:22:15-- https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py Resolving raw.githubusercontent.com (raw.githubusercontent.com)…

ningwebbeginner updated 2 months ago
11
vllm-project/vllm #7242

[Bug]: Miscalculation for ITL

### Your current environment This bug is irrelevant to environment. ### 🐛 Describe the bug Thanks for open source such a excellent project! I found it possibly misses an "else" in the asyn…

AslanEZ updated 1 month ago
6
vllm-project/vllm #5203

[Feature]: inconsistent vocab_sizes support for draft and ta…

### 🚀 The feature, motivation and pitch Currently, vllm with Speculative Decoding requires that the draft model and target model have the same vocab size. However, the target model may have a large…

ShangmingCai updated 4 months ago
7
vllm-project/vllm #8945

[Question]: Apply LoRA adapter on quantized model

### Anything you want to discuss about vllm. I've fine-tuned Qwen2.5-14B-Instruct using QLora(bitsandbytes 4bit) and also a full fine-tune. However when I tried to use it with a quantized model (Qw…

Tejaswgupta updated 2 weeks ago
9
vllm-project/vllm #5951

[Usage]: Can't support Phi-3-medium-* models with more than …

### Your current environment When I set `VLLM_TENSOR_PARALLEL_SIZE = 2`, it works well. But when I change it to 4, vllm can not support Phi3-medium-*. ``` torch=2.3.0 vllm=0.5.0.post1 transform…

fzp0424 updated 2 weeks ago
3
HabanaAI/vllm-fork #140

[Bug]: llama 405B fp8 fails

### Your current environment ``` #!/bin/bash # Copyright (C) 2024 Intel Corporation # SPDX-License-Identifier: Apache-2.0 # Set default values default_port=8008 default_model=$LLM_MODEL defa…

endomorphosis updated 1 month ago
4
vllm-project/vllm #7721

[Bug]: torch.OutOfMemoryError: CUDA out of memory

### Your current environment vllm==0.5.4 GPU: L20, Memory 46GB ```text Package Version --------------------------------- ------------ aiohappyeyeballs …

Sandwiches97 updated 2 months ago
5
intel/media-driver #1298

libva error: /usr/lib/dri/iHD_drv_video.so init failed

running an intel chip Intel Celeron N3150 post broadwell (brasswell released in 2015) and i get this with the intel-media-driver on arch any suggestions on how to make it work ? thanks in advance

D0nnieD4rk0 updated 4 months ago
34
vllm-project/vllm #9377

[Bug]: Model architectures ['LlavaForConditionalGeneration']…

### Your current environment The output of `python collect_env.py` ```text WARNING 10-15 15:24:09 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead,…

DwenGu updated 3 weeks ago
11

上一页 1...51 52 53 54 55 56 57...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding