alipay PainlessInferenceAcceleration issues

alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International

268 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

stop_ids does not seem to be taking effect？

#28 jianyuheng opened 2 days ago
0
modeling_qwen attention not use multi branch position ids & attention_mask

#27 snippetzero opened 1 month ago
1
lookahead with do_sample=True does not take temperature, top_k, top_p

#26 learning-chip opened 2 months ago
2
How is verification done in PAIN?

#25 jivanph opened 2 months ago
0
Do lookahead and repetition_penalty conflict?

#24 zhanweiw opened 3 months ago
1
AntRAG

#23 nrmer opened 3 months ago
1
Changing naive attention to SDPA gives wrong result for batched llama example

#22 learning-chip opened 4 months ago
3
size of memory footprint

#21 nrmer closed 4 months ago
1
是否支持Qwen 1.5?

#20 hwang824 opened 4 months ago
0
TODO in PainlessInferenceAcceleration/pia/lookahead/common/lookahead_cache.py

#19 nrmer closed 4 months ago
1
Clarification on edls/dls/ft in perf_check

#18 nrmer closed 4 months ago
1
when the batchSzie > 1, the lookahead not work.

#17 yuenyu1 closed 4 months ago
0
Error: no attribute 'rope_theta'` for llama2 model

#16 learning-chip closed 4 months ago
1
Consultation on Trie Tree Maintenance？

#15 ZipECHO closed 4 months ago
7
Counting how many forward passes/steps were done when using PAIN

#14 jivanph opened 5 months ago
4
Consider Support CodeLlama?

#13 RainYQ opened 5 months ago
2
是不支持ptuning之后的模型吗？

#12 13269279918 closed 5 months ago
1
为什么我测试没有性能提升？

#11 MeJerry215 closed 5 months ago
3
Dockerfile集成pia无法使用

#10 May-Yaha closed 5 months ago
7
In the benchmark studies, how are the draft tokens generated?

#9 jivanph opened 5 months ago
9
Update README.md

#8 eltociear closed 5 months ago
0
BUG for chatglm3-6b and Qwen-14B-Int4

#7 AGI-Jarvis closed 5 months ago
7
为什么感觉没有什么效果

#6 shinerdeng closed 5 months ago
6
速度确实有提升，但是生成的质量存在问题

#5 dafen12 closed 5 months ago
2
这个lookahead的实现和原版hao-ailab的实现相比优化点在哪里？

#4 janelu9 closed 5 months ago
1
How the performance VS vLLM inference（vLLM vs Lookahead）

#3 buptygz opened 5 months ago
5
可以批(batch)推理吗？比如一次推理256个输入？满级压榨GPU！^_^

#2 janelu9 closed 5 months ago
3
请问有支持量化版本的qwen架构的计划吗？

#1 xiningnlp closed 5 months ago
2