speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

meta-llama/llama-stack #344

vLLM can't find model from llama download

### System Info It's using the versions downloaded by pip install during the llama stack build. I have an nvidia GPU ### Information - [X] The official example scripts - [ ] My own modified…

stevegrubb updated 20 hours ago
6
IsaacRe/vllm-kvcompress #2

[Installation&Bug]: First example. TypeError: CommonMetadata…

Thank you for the great work and the pre-print! I have a question in running the code. I would appreciate if you could answer it. As for installation, I followed the standard steps as in, ``` doc…

alpemreacar updated 4 days ago
4
mlc-ai/mlc-llm #2801

[Bug] Speculative decoding with 2 additional models

## 🐛 Bug ## ❓ General Questions Based on https://llm.mlc.ai/docs/deploy/rest.html#id5, we can use more than 1 additional models as we use speculative decoding mode. But when get response via re…

bethalianovike updated 2 months ago
2
hao-ai-lab/LookaheadDecoding #44

Questions on combined attention mask structure for Jacobi it…

I have some questions about the structure of custom mask for lookahead and verify branches [as described in the blog](https://lmsys.org/blog/2023-11-21-lookahead-decoding/#lookahead-and-verify-in-the…

learning-chip updated 9 months ago
23
run-llama/llama_index #13925

[Question]: how to use the llamaindex+vllm correctly?

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question I install the llamaindex with the command `pip install llama-index` and install t…

lambda7xx updated 3 months ago
12
vllm-project/vllm #8918

[Performance] TTFT regression from v0.5.4 to 0.6.2

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... WARNING 09-27 15:24:15 _custom_ops.py:15] Failed to import from vllm._C with Mo…

rickyyx updated 1 month ago
4
EleutherAI/lm-evaluation-harness #2296

Low GPU Utilization During Multi-GPU evaluation - Efficiency…

Hello, I want to express my gratitude for your outstanding work. The powerful lm-evaluation-harness and your continuous maintenance have made LLM-evaluation much more convenient. However, I hav…

yang3121099 updated 1 month ago
1
vllm-project/vllm #4303

[Feature]: batched parallel decoding

### 🚀 The feature, motivation and pitch [Parallel/Jacobi decoding](https://arxiv.org/abs/2305.10427) improves inference efficiency by breaking the sequential nature of conventional auto-regressive …

snyhlxde1 updated 4 days ago
8
vllm-project/vllm #6445

[Bug]: TypeError: 'NoneType' object is not callable when sta…

### Your current environment Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Cen…

candowu updated 1 day ago
9
EleutherAI/lm-evaluation-harness #2177

when executes the OPT 6.7B model evaluation, the problem Typ…

```python Running loglikelihood requests: 0%| | 0/18330 [00:00

yanchenmochen updated 2 months ago
10

上一页 1...24 25 26 27 28 29 30...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding