issues
search
dilab-zju
/
self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
141
stars
9
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Tensor Parallelism
#21
lethean1
opened
1 month ago
1
The experimental results are inconsistent.
#20
HuYunhai-Alex
opened
2 months ago
2
Reject Sampling & Adaptive Draft-Exiting Mechanism
#19
wutong4012
closed
4 months ago
1
LLAMA3 - Anyone Ran this experiment with llama3 ?
#18
pandirabhishek
closed
2 months ago
1
Bayesian Optimization Search Method for CodeLlama and human-eval
#17
MichaelMtl
closed
5 months ago
4
Question about self-speculative + greedy decoding
#16
EganGu
closed
6 months ago
2
Unable to reimplement 1.4x speedup with llama-2-chat
#15
hemingkx
closed
7 months ago
2
some problems occured when execuating file search.ipynb
#14
hunzhizi
closed
8 months ago
1
questions about Llama2-70b
#13
xinlong-yang
closed
8 months ago
2
Unable to get 1.5 speedup using 13B model?
#12
w32zhong
closed
8 months ago
1
Code for Llama-7b and Mistral
#11
DRXD1000
closed
10 months ago
1
Questions about modeling_llama.py
#10
qiyuangong
closed
10 months ago
3
question about the training data for bayesian and model size
#9
irasin
closed
10 months ago
2
skipped layer and llama2 70b chat
#8
jaemin-han
closed
10 months ago
13
Proposal: Evaluating Faster, Deterministic Alternatives to Bayesian Optimization for Layer Skipping in Large Models
#7
azurespace
closed
1 year ago
1
could you release the test dataset in your experiment?
#6
pengfeiwu1999
closed
1 year ago
1
Can I get skipped layer index set of LLaMA-70B tested on your paper??
#5
je1lee
closed
1 year ago
0
KV cache footprint
#4
JYYHH
closed
1 year ago
1
Can you share your prompt of LLaMA-2-70B?
#3
jaemin-han
closed
1 year ago
2
Data on optimal layers to skip?
#2
KerfuffleV2
closed
1 year ago
5
the decoding code
#1
Ma-Yongqiang
closed
1 year ago
5