speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #9723

[Bug]: Incoherent Offline Inference Single Video with Qwen2-…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4…

hector-gr updated 1 week ago
16
huggingface/transformers #27649

Adding support for lookahead decoding for autoregressive (de…

### Feature request Fu et al. propose a novel decoding technique that accelerates greedy decoding on Llama 2 and Code-Llama by 1.5-2x across various parameters sizes, without a draft model. This meth…

shermansiu updated 11 months ago
9
runpod-workers/worker-vllm #88

Unable to deploy mistralai/Mistral-Nemo-Instruct-2407

Hello you all keep scratching my head why sometimes I can deploy all on list but stuff I find having issues anyways this is my logs just trying to use this repo https://huggingface.co/mistralai/Mis…

TheMindExpansionNetwork updated 3 months ago
5
vllm-project/vllm #4392

[Bug]: Running llama2-7b on H20, Floating point exception (c…

### Your current environment PyTorch version: 2.2.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (U…

yk1012664593 updated 1 day ago
15
PygmalionAI/aphrodite-engine #496

[Feature]: Speculative decoding with dual GPUs

### 🚀 The feature, motivation and pitch I got this error when trying speculative decoding with 2 4090s: * https://github.com/vllm-project/vllm/issues/4358 And it looks like that was fixed/added…

josephrocca updated 2 months ago
1
mlc-ai/mlc-llm #2818

[Question] Is anybody able to run speculative decoding on AM…

## ❓ General Questions This is the error I am getting - **TVMError: Check failed: token_tree_parent_ptr[j] == j - verify_start (0 vs. 1) : CPU sampler only supports chain-style draft tokens.** T…

manu-web updated 2 months ago
7
openvinotoolkit/openvino.genai #267

[Good First Issue]: Verify mpt-7b-chat with GenAI text_gener…

### Context This task regards enabling tests for **mpt-7b-chat**. You can find more details under openvino_notebooks [LLM chatbot README.md](https://github.com/openvinotoolkit/openvino_notebooks/tree…

p-wysocki updated 3 weeks ago
17
vllm-project/vllm #7714

[Bug]: Unable to use fp8 kv cache with chunked prefill on am…

### Your current environment The output of `python collect_env.py` ```text PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N…

w013nad updated 1 week ago
16
QwenLM/Qwen2-VL #239

The demo code provided by read.md runs incorrectly

When I use the demo provided by read.md to run to output_ids = model.generate(**inputs, max_new_tokens=128), an error RuntimeError: Expected all tensors to be on the same device, but found at least tw…

WilliamRocketRen updated 1 month ago
10
NVIDIA/TensorRT-LLM #827

System hangs when I use multiple GPUs

Single GPU is OK, System hangs when I use multiple GPUs. Can someone help solve this? Thanks. python build.py --model_dir meta-llama/Llama-2-7b-chat-hf \ --dtype float16 \ …

yirunwang updated 2 months ago
4

上一页 1...30 31 32 33 34 35 36...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding