SJTU-IPADS PowerInfer issues

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.96k stars 412 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

The code about the figures in paper

#180 YuMJie closed 7 months ago
1
Fix: offload ffn norm weights

#179 hodlen closed 7 months ago
0
Hotfix: failed to build sparse FFN with LLM_GATE_SEQ

#178 hodlen closed 7 months ago
0
Fix: remove Axpy dense op

#177 hodlen closed 7 months ago
0
More quantize support

#176 YixinSong-e opened 7 months ago
0
Unable to generate constant output

#175 rikoras closed 7 months ago
2
Optimize `mul_mat_sparse` for INT4 quantized weights

#174 hodlen closed 7 months ago
0
convert.py: error: the following arguments are required: mlp_model

#173 wojiaoshihua closed 7 months ago
4
Where is the definition or addition location of GGML_USE_HYBRID_THREADING?

#172 wfloveiu opened 7 months ago
2
Support Bamboo LM sparse and dense inference

#171 hodlen closed 7 months ago
0
invalid device symbol

#170 czq693497091 closed 8 months ago
0
How to assign the specified CUDA_VISIBLE_DEVICE?

#169 czq693497091 closed 8 months ago
0
Use `mul_mat_transpose` at axpy op for large batch

#168 Begunner closed 7 months ago
0
two questions that i want to solve

#167 yeptttt opened 8 months ago
2
Segmentation fault (core dumped) in ggml test

#166 YuMJie closed 8 months ago
0
Clarification on Output Neuron Pruning Method in "Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time

#165 adamgallas closed 8 months ago
2
Will we have instruct fine-tuned model support in the future?

#164 ZeonfaiHo opened 8 months ago
1
Does PowerInfer support multi-GPU?

#163 LHQUer closed 7 months ago
1
[Question]: High PPL on wikitext2 of ReLU-LLAMA-7B for language modeling tasks

#162 llCurious opened 8 months ago
2
Fix: unrecognised host compiler flags passed to nvcc

#161 hodlen closed 8 months ago
0
24GB的显存只能占用12GB，CUDA占用也不到10%。但是CPU占用100%内存占用35GB

#160 NerounCstate closed 8 months ago
1
24GB的显存只能占用12GB，CUDA占用也不到10%。但是CPU占用100%内存占用35GB

#159 NerounCstate opened 8 months ago
1
[ROCm] Is AMD ROCm support available in near future?

#158 Orion-zhen opened 8 months ago
3
The CUDA compiler identification is unknown And PowerInfer was compiled without cuBLAS

#157 LHQUer opened 8 months ago
1
How to work it？ When running the code, it gave a warning:“PowerInfer was compiled without cuBLAS. It is not possible to set a VRAM budget.”

#156 LHQUer closed 8 months ago
0
是否有计划兼容openai API风格的计划？

#155 shmily91 opened 8 months ago
1
The quesion about Neuron-aware Operator

#154 YuMJie closed 8 months ago
3
Full-GPU computational graph and CUDA refactoring

#153 hodlen closed 8 months ago
0
The quesion about multiple GPu

#152 YuMJie closed 8 months ago
4
关于token生成速率的计算问题

#150 bulaikexiansheng opened 8 months ago
3
Add prosparse to README

#149 Raincleared-Song closed 8 months ago
0
Please make a tinyllama v1.0 version for use.

#148 kolinfluence opened 9 months ago
1
How to count the size of the model and intermediate tensors on the GPU and main memory respectively

#147 YuMJie closed 8 months ago
3
Fix compiling issue under git worktrees

#146 hodlen closed 9 months ago
0
Fix CMake version requirement in README

#145 hodlen closed 9 months ago
0
CMake 3.17 or higher is required. The repository asks for version 3.13.4

#144 Rubiel1 closed 9 months ago
1
关于LLaMA-70B-PowerInfer-GGUF的chat版本

#143 NerounCstate opened 9 months ago
1
The critical code for deciding which layer to put on cpu or gpu

#142 YuMJie closed 9 months ago
3
possible to do one that can fit into 7GB vram?

#141 sprappcom opened 9 months ago
2
Will using only CPU be faster than llama.cpp?

#140 liutt1312 opened 9 months ago
1
Fix: incomplete macro definitions prevent PowerInfer being compiled to AMD ROCm

#139 mack-w closed 9 months ago
0
Fix a bug of memory leak on Windows platform

#138 Tan-YiFan closed 9 months ago
0
Full GPU computational graph

#137 hodlen closed 8 months ago
0
Docs: add detailed instruction on downloading HF models

#136 hodlen closed 9 months ago
0
How to acquire predictor weights

#135 fakerybakery closed 9 months ago
2
Fix activation file detection

#134 hodlen closed 9 months ago
0
Update README: add Windows-specific commands

#133 hodlen closed 9 months ago
0
Remove unused toolchain files

#132 hodlen closed 9 months ago
0
FFN Offloading Failed: Activation_32 Not Found (Very slow inference)

#131 onlyone-hyphen closed 9 months ago
6
How to understand the codes of llama.cpp?

#130 BHbean opened 9 months ago
2

Previous Next