speculative-decoding Search Results

1000+ results
for speculative-decoding

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #2155

Recurrent Drafter not working

### System Info - GPU: nvidia A30 - TensorRT-LLM: commit [32ed92e](https://github.com/chiendb97/TensorRT-LLM/commit/32ed92e4491baf2d54682a21d247e1948cca996e) - Nvidia driver: 535.86.10 - Ubuntu 22.04…

binhtranmcs updated 1 week ago
17
vllm-project/vllm #9609

[Performance]: test speculative decode accuracy

### Proposal to improve performance I use lm-evaluation-harness to test vllm accuracy 1.when don't enable spec decode,I got some result below num_concurrent=1 ![image](https://github.com/user-atta…

v-lmn updated 1 week ago
1
mlc-ai/mlc-llm #2789

[Question] Speculative Decoding Metrics Variable

## ❓ General Questions What is the meaning behind `draft_count`, `accept_count`, and `spec_draft_length`? Thank you in advance!

bethalianovike updated 2 months ago
1
deepseek-ai/DeepSeek-Prover-V1.5 #10

运行quick_start.py但是报错，可能是lean4的问题？

你好，首先非常感谢这个非常棒的开源工程的工作！我在按照安装说明安装好依赖和mathlib后，执行quick_start.py，但是并没有得到预期结果，NN模型有正确输出结果，但是lean4的验证有问题。 python quick_start.py Special tokens have been added in the vocabulary, make sure the associated…

fixtech updated 1 week ago
1
vllm-project/vllm #6011

[Bug]: Segmentation fault (core dumped) while loading deepse…

### Your current environment ```text Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A …

zxdvd updated 1 week ago
6
theroyallab/tabbyAPI #226

[BUG] Inline loading doesn't respect config.yml

### OS Linux ### GPU Library CUDA 12.x ### Python version 3.12 ### Describe the bug When a model is loaded inline, it doesn't respect the parameters set in config.yml, such as when loading a mo…

Async0x42 updated 20 hours ago
3
intel-analytics/ipex-llm #11649

vllm_cpu_docker_quickstart run error on Aliyun ecs.c8i.24xla…

I follow the Doc: https://github.com/intel-analytics/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_cpu_docker_quickstart.md My Ecs is using Aliyun ecs.c8i.24xlarge ECS( https://help.aliyun.com/z…

linuxliker updated 3 months ago
3
vllm-project/vllm #9770

[Bug]: v6.3 Gibberish produced with long ctx (Machete + W4A1…

Gibberish is not produced on the previous version with the same request. ### Your current environment The output of `python collect_env.py` ```plaintext Collecting environment informatio…

osilverstein updated 3 days ago
8
NVIDIA/TensorRT-LLM #1938

problem with tensorrt_llm performance

### System Info hi, i generated the tensorrt llm engine for a llama based model and see that the performance is much worse than vllm. i did the following: - compile model with tensorrt llm c…

Arnold1 updated 1 month ago
4
vllm-project/vllm #7713

[Bug]: The MixtralForCausalLM architecture and the mistralai…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch…

Anwesh2 updated 2 months ago
4

上一页 1...20 21 22 23 24 25 26...100 下一页

1000+ results for speculative-decoding

1000+ results
for speculative-decoding