issues
search
alipay
/
PainlessInferenceAcceleration
Creative Commons Attribution 4.0 International
268
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
stop_ids does not seem to be taking effect?
#28
jianyuheng
opened
2 days ago
0
modeling_qwen attention not use multi branch position ids & attention_mask
#27
snippetzero
opened
1 month ago
1
lookahead with do_sample=True does not take temperature, top_k, top_p
#26
learning-chip
opened
2 months ago
2
How is verification done in PAIN?
#25
jivanph
opened
2 months ago
0
Do lookahead and repetition_penalty conflict?
#24
zhanweiw
opened
3 months ago
1
AntRAG
#23
nrmer
opened
3 months ago
1
Changing naive attention to SDPA gives wrong result for batched llama example
#22
learning-chip
opened
4 months ago
3
size of memory footprint
#21
nrmer
closed
4 months ago
1
是否支持Qwen 1.5?
#20
hwang824
opened
4 months ago
0
TODO in PainlessInferenceAcceleration/pia/lookahead/common/lookahead_cache.py
#19
nrmer
closed
4 months ago
1
Clarification on edls/dls/ft in perf_check
#18
nrmer
closed
4 months ago
1
when the batchSzie > 1, the lookahead not work.
#17
yuenyu1
closed
4 months ago
0
Error: no attribute 'rope_theta'` for llama2 model
#16
learning-chip
closed
4 months ago
1
Consultation on Trie Tree Maintenance?
#15
ZipECHO
closed
4 months ago
7
Counting how many forward passes/steps were done when using PAIN
#14
jivanph
opened
5 months ago
4
Consider Support CodeLlama?
#13
RainYQ
opened
5 months ago
2
是不支持ptuning之后的模型吗?
#12
13269279918
closed
5 months ago
1
为什么我测试没有性能提升?
#11
MeJerry215
closed
5 months ago
3
Dockerfile集成pia无法使用
#10
May-Yaha
closed
5 months ago
7
In the benchmark studies, how are the draft tokens generated?
#9
jivanph
opened
5 months ago
9
Update README.md
#8
eltociear
closed
5 months ago
0
BUG for chatglm3-6b and Qwen-14B-Int4
#7
AGI-Jarvis
closed
5 months ago
7
为什么感觉没有什么效果
#6
shinerdeng
closed
5 months ago
6
速度确实有提升,但是生成的质量存在问题
#5
dafen12
closed
5 months ago
2
这个lookahead的实现和原版hao-ailab的实现相比优化点在哪里?
#4
janelu9
closed
5 months ago
1
How the performance VS vLLM inference(vLLM vs Lookahead)
#3
buptygz
opened
5 months ago
5
可以批(batch)推理吗?比如一次推理256个输入?满级压榨GPU!^_^
#2
janelu9
closed
5 months ago
3
请问有支持量化版本的qwen架构的计划吗?
#1
xiningnlp
closed
5 months ago
2