issues
search
SJTU-IPADS
/
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.96k
stars
412
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
The code about the figures in paper
#180
YuMJie
closed
7 months ago
1
Fix: offload ffn norm weights
#179
hodlen
closed
7 months ago
0
Hotfix: failed to build sparse FFN with LLM_GATE_SEQ
#178
hodlen
closed
7 months ago
0
Fix: remove Axpy dense op
#177
hodlen
closed
7 months ago
0
More quantize support
#176
YixinSong-e
opened
7 months ago
0
Unable to generate constant output
#175
rikoras
closed
7 months ago
2
Optimize `mul_mat_sparse` for INT4 quantized weights
#174
hodlen
closed
7 months ago
0
convert.py: error: the following arguments are required: mlp_model
#173
wojiaoshihua
closed
7 months ago
4
Where is the definition or addition location of GGML_USE_HYBRID_THREADING?
#172
wfloveiu
opened
7 months ago
2
Support Bamboo LM sparse and dense inference
#171
hodlen
closed
7 months ago
0
invalid device symbol
#170
czq693497091
closed
8 months ago
0
How to assign the specified CUDA_VISIBLE_DEVICE?
#169
czq693497091
closed
8 months ago
0
Use `mul_mat_transpose` at axpy op for large batch
#168
Begunner
closed
7 months ago
0
two questions that i want to solve
#167
yeptttt
opened
8 months ago
2
Segmentation fault (core dumped) in ggml test
#166
YuMJie
closed
8 months ago
0
Clarification on Output Neuron Pruning Method in "Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
#165
adamgallas
closed
8 months ago
2
Will we have instruct fine-tuned model support in the future?
#164
ZeonfaiHo
opened
8 months ago
1
Does PowerInfer support multi-GPU?
#163
LHQUer
closed
7 months ago
1
[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks
#162
llCurious
opened
8 months ago
2
Fix: unrecognised host compiler flags passed to nvcc
#161
hodlen
closed
8 months ago
0
24GB的显存只能占用12GB,CUDA占用也不到10%。但是CPU占用100%内存占用35GB
#160
NerounCstate
closed
8 months ago
1
24GB的显存只能占用12GB,CUDA占用也不到10%。但是CPU占用100%内存占用35GB
#159
NerounCstate
opened
8 months ago
1
[ROCm] Is AMD ROCm support available in near future?
#158
Orion-zhen
opened
8 months ago
3
The CUDA compiler identification is unknown And PowerInfer was compiled without cuBLAS
#157
LHQUer
opened
8 months ago
1
How to work it? When running the code, it gave a warning:“PowerInfer was compiled without cuBLAS. It is not possible to set a VRAM budget.”
#156
LHQUer
closed
8 months ago
0
是否有计划兼容openai API风格的计划?
#155
shmily91
opened
8 months ago
1
The quesion about Neuron-aware Operator
#154
YuMJie
closed
8 months ago
3
Full-GPU computational graph and CUDA refactoring
#153
hodlen
closed
8 months ago
0
The quesion about multiple GPu
#152
YuMJie
closed
8 months ago
4
关于token生成速率的计算问题
#150
bulaikexiansheng
opened
8 months ago
3
Add prosparse to README
#149
Raincleared-Song
closed
8 months ago
0
Please make a tinyllama v1.0 version for use.
#148
kolinfluence
opened
9 months ago
1
How to count the size of the model and intermediate tensors on the GPU and main memory respectively
#147
YuMJie
closed
8 months ago
3
Fix compiling issue under git worktrees
#146
hodlen
closed
9 months ago
0
Fix CMake version requirement in README
#145
hodlen
closed
9 months ago
0
CMake 3.17 or higher is required. The repository asks for version 3.13.4
#144
Rubiel1
closed
9 months ago
1
关于LLaMA-70B-PowerInfer-GGUF的chat版本
#143
NerounCstate
opened
9 months ago
1
The critical code for deciding which layer to put on cpu or gpu
#142
YuMJie
closed
9 months ago
3
possible to do one that can fit into 7GB vram?
#141
sprappcom
opened
9 months ago
2
Will using only CPU be faster than llama.cpp?
#140
liutt1312
opened
9 months ago
1
Fix: incomplete macro definitions prevent PowerInfer being compiled to AMD ROCm
#139
mack-w
closed
9 months ago
0
Fix a bug of memory leak on Windows platform
#138
Tan-YiFan
closed
9 months ago
0
Full GPU computational graph
#137
hodlen
closed
8 months ago
0
Docs: add detailed instruction on downloading HF models
#136
hodlen
closed
9 months ago
0
How to acquire predictor weights
#135
fakerybakery
closed
9 months ago
2
Fix activation file detection
#134
hodlen
closed
9 months ago
0
Update README: add Windows-specific commands
#133
hodlen
closed
9 months ago
0
Remove unused toolchain files
#132
hodlen
closed
9 months ago
0
FFN Offloading Failed: Activation_32 Not Found (Very slow inference)
#131
onlyone-hyphen
closed
9 months ago
6
How to understand the codes of llama.cpp?
#130
BHbean
opened
9 months ago
2
Previous
Next