issues
search
FasterDecoding
/
Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
https://sites.google.com/view/medusa-llm
Apache License 2.0
2.28k
stars
154
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Medusa 1 and 2 speed up
#73
LotuSrc
closed
9 months ago
2
update Community Adoption for RTP-LLM
#72
zhyncs
closed
9 months ago
2
V1.0 prerelease
#71
ctlllll
closed
9 months ago
0
Training Medusa heads
#70
mmilunovic-mdcs
opened
9 months ago
6
OSError
#69
qspang
opened
9 months ago
3
About changing LLM from LLAMA to LLAMA-2
#68
dydrkfl06
closed
9 months ago
2
how did you construct the sparse tree architecture
#67
pengfeiwu1999
closed
9 months ago
2
Clarifications on Models + Batch Size
#66
RonanKMcGovern
closed
10 months ago
5
Can I make an AWQ quantization?
#65
RonanKMcGovern
closed
10 months ago
1
Sparse candidate generation confusion
#64
zankner
closed
9 months ago
6
Some questions about sampling strategy
#63
qianxiao1111
closed
12 months ago
3
Results for different configs
#62
zankner
closed
11 months ago
8
How to load finetune checkpoint files directly?
#61
qianxiao1111
closed
12 months ago
0
AttributeError: 'LlamaForCausalLM' object has no attribute 'medusa_head'
#60
blwaji
closed
9 months ago
2
AttributeError: 'LlamaForCausalLM' object has no attribute 'medusa_head'
#59
blwaji
closed
1 year ago
0
FasterTransformer support
#57
niyunsheng
opened
1 year ago
1
Will using this method result in inconsistent output results?
#56
niyunsheng
closed
9 months ago
8
TypeError: __init__() got an unexpected keyword argument 'medusa_num_heads'
#55
HackGiter
closed
11 months ago
5
Mistral 7B model support
#54
JianbangZ
closed
9 months ago
4
Llm judge update
#53
leeyeehoo
closed
1 year ago
0
[Feature Request] Qwen model support
#52
JianbangZ
opened
1 year ago
1
errors occurred when running simple_gradio_interface.py
#51
MeWannaSleep
closed
1 year ago
2
Install the package with the console script ?
#50
devrimcavusoglu
closed
9 months ago
1
How to test latency between medusa & baseline
#49
YixinSong-e
closed
1 year ago
3
name not exist "from medusa.model.medusa_choices import medusa_choices"
#48
JianbangZ
closed
9 months ago
4
update roadmap
#46
leeyeehoo
closed
1 year ago
0
CUBLAS_STATUS_EXECUTION_FAILED when training Medusa Head with base model set to Llama2 7B
#45
void-main
closed
9 months ago
7
Sparse tree
#42
ctlllll
closed
1 year ago
0
vLLM support
#41
MichaelJayW
opened
1 year ago
12
Pull main to sparse_tree
#40
leeyeehoo
closed
1 year ago
0
[New feature] More sampling schemes
#39
Jokoe66
closed
9 months ago
2
add development bounty
#38
ctlllll
closed
1 year ago
0
Benchmark results
#37
JianbangZ
closed
1 year ago
3
[New feature] Fine-tune Medusa heads during SFT
#36
ctlllll
closed
9 months ago
4
[New feature] llama.cpp support
#35
ctlllll
opened
1 year ago
7
[Research] Explore tree sparsity (speed +10%-20%)
#34
ctlllll
closed
1 year ago
0
[New feature] mlc-llm support
#33
ctlllll
opened
1 year ago
8
[New feature] exllama support
#32
ctlllll
opened
1 year ago
0
[Inference] IndexError: list index out of range
#31
helldog-star
closed
1 year ago
2
Fork base model's last two decoder layers
#30
austinsilveria
closed
1 year ago
21
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
#29
YixinSong-e
closed
1 year ago
0
add docstrings
#27
rajveer43
closed
1 year ago
1
Add an option to override base model path
#26
Btlmd
closed
1 year ago
1
batch support
#25
thistleknot
closed
1 year ago
1
gguf
#24
thistleknot
closed
1 year ago
1
Add a simple gradio interface, make life easier
#23
Mrw33554432
closed
1 year ago
2
update roadmap
#22
leeyeehoo
closed
1 year ago
0
Update ROADMAP.md
#21
leeyeehoo
closed
1 year ago
0
Merge pull request #19 from FasterDecoding/main
#20
leeyeehoo
closed
1 year ago
0
N/A
#19
leeyeehoo
closed
1 year ago
0
Previous
Next