FasterDecoding Medusa issues

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

https://sites.google.com/view/medusa-llm

Apache License 2.0

2.28k stars 154 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Medusa 1 and 2 speed up

#73 LotuSrc closed 9 months ago
2
update Community Adoption for RTP-LLM

#72 zhyncs closed 9 months ago
2
V1.0 prerelease

#71 ctlllll closed 9 months ago
0
Training Medusa heads

#70 mmilunovic-mdcs opened 9 months ago
6
OSError

#69 qspang opened 9 months ago
3
About changing LLM from LLAMA to LLAMA-2

#68 dydrkfl06 closed 9 months ago
2
how did you construct the sparse tree architecture

#67 pengfeiwu1999 closed 9 months ago
2
Clarifications on Models + Batch Size

#66 RonanKMcGovern closed 10 months ago
5
Can I make an AWQ quantization?

#65 RonanKMcGovern closed 10 months ago
1
Sparse candidate generation confusion

#64 zankner closed 9 months ago
6
Some questions about sampling strategy

#63 qianxiao1111 closed 12 months ago
3
Results for different configs

#62 zankner closed 11 months ago
8
How to load finetune checkpoint files directly？

#61 qianxiao1111 closed 12 months ago
0
AttributeError: 'LlamaForCausalLM' object has no attribute 'medusa_head'

#60 blwaji closed 9 months ago
2
AttributeError: 'LlamaForCausalLM' object has no attribute 'medusa_head'

#59 blwaji closed 1 year ago
0
FasterTransformer support

#57 niyunsheng opened 1 year ago
1
Will using this method result in inconsistent output results?

#56 niyunsheng closed 9 months ago
8
TypeError: __init__() got an unexpected keyword argument 'medusa_num_heads'

#55 HackGiter closed 11 months ago
5
Mistral 7B model support

#54 JianbangZ closed 9 months ago
4
Llm judge update

#53 leeyeehoo closed 1 year ago
0
[Feature Request] Qwen model support

#52 JianbangZ opened 1 year ago
1
errors occurred when running simple_gradio_interface.py

#51 MeWannaSleep closed 1 year ago
2
Install the package with the console script ?

#50 devrimcavusoglu closed 9 months ago
1
How to test latency between medusa & baseline

#49 YixinSong-e closed 1 year ago
3
name not exist "from medusa.model.medusa_choices import medusa_choices"

#48 JianbangZ closed 9 months ago
4
update roadmap

#46 leeyeehoo closed 1 year ago
0
CUBLAS_STATUS_EXECUTION_FAILED when training Medusa Head with base model set to Llama2 7B

#45 void-main closed 9 months ago
7
Sparse tree

#42 ctlllll closed 1 year ago
0
vLLM support

#41 MichaelJayW opened 1 year ago
12
Pull main to sparse_tree

#40 leeyeehoo closed 1 year ago
0
[New feature] More sampling schemes

#39 Jokoe66 closed 9 months ago
2
add development bounty

#38 ctlllll closed 1 year ago
0
Benchmark results

#37 JianbangZ closed 1 year ago
3
[New feature] Fine-tune Medusa heads during SFT

#36 ctlllll closed 9 months ago
4
[New feature] llama.cpp support

#35 ctlllll opened 1 year ago
7
[Research] Explore tree sparsity (speed +10%-20%)

#34 ctlllll closed 1 year ago
0
[New feature] mlc-llm support

#33 ctlllll opened 1 year ago
8
[New feature] exllama support

#32 ctlllll opened 1 year ago
0
[Inference] IndexError: list index out of range

#31 helldog-star closed 1 year ago
2
Fork base model's last two decoder layers

#30 austinsilveria closed 1 year ago
21
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding

#29 YixinSong-e closed 1 year ago
0
add docstrings

#27 rajveer43 closed 1 year ago
1
Add an option to override base model path

#26 Btlmd closed 1 year ago
1
batch support

#25 thistleknot closed 1 year ago
1
gguf

#24 thistleknot closed 1 year ago
1
Add a simple gradio interface, make life easier

#23 Mrw33554432 closed 1 year ago
2
update roadmap

#22 leeyeehoo closed 1 year ago
0
Update ROADMAP.md

#21 leeyeehoo closed 1 year ago
0
Merge pull request #19 from FasterDecoding/main

#20 leeyeehoo closed 1 year ago
0
N/A

#19 leeyeehoo closed 1 year ago
0

Previous Next