issues
search
dilab-zju
/
self-speculative-decoding
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
114
stars
8
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bayesian Optimization Search Method for CodeLlama and human-eval
#17
MichaelMtl
closed
3 weeks ago
4
Question about self-speculative + greedy decoding
#16
EganGu
closed
2 months ago
2
Unable to reimplement 1.4x speedup with llama-2-chat
#15
hemingkx
closed
2 months ago
2
some problems occured when execuating file search.ipynb
#14
hunzhizi
closed
3 months ago
1
questions about Llama2-70b
#13
xinlong-yang
closed
3 months ago
2
Unable to get 1.5 speedup using 13B model?
#12
w32zhong
closed
4 months ago
1
Code for Llama-7b and Mistral
#11
DRXD1000
closed
5 months ago
1
Questions about modeling_llama.py
#10
qiyuangong
closed
5 months ago
3
question about the training data for bayesian and model size
#9
irasin
closed
6 months ago
2
skipped layer and llama2 70b chat
#8
jaemin-han
closed
5 months ago
13
Proposal: Evaluating Faster, Deterministic Alternatives to Bayesian Optimization for Layer Skipping in Large Models
#7
azurespace
closed
7 months ago
1
could you release the test dataset in your experiment?
#6
pengfeiwu1999
closed
8 months ago
1
Can I get skipped layer index set of LLaMA-70B tested on your paper??
#5
je1lee
closed
8 months ago
0
KV cache footprint
#4
JYYHH
closed
8 months ago
1
Can you share your prompt of LLaMA-2-70B?
#3
jaemin-han
closed
8 months ago
2
Data on optimal layers to skip?
#2
KerfuffleV2
closed
9 months ago
5
the decoding code
#1
Ma-Yongqiang
closed
8 months ago
5