SJTU-IPADS PowerInfer issues

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

MIT License

7.96k stars 412 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

not enough space in the buffer with long prompts

#129 RachelShalom opened 9 months ago
2
BenchMark 复现遇到了问题

#128 wahaha22 opened 9 months ago
9
Fix CUDA performance regression due to auto offloading

#127 hodlen closed 9 months ago
0
how to show the outputs result on a web-service? or how can i get the result of inferrence for other application?

#126 xujiangyu opened 9 months ago
3
期待支持maxtral 8X7b

#125 shmily91 closed 10 months ago
1
Would you please kindly offer the data, codes, or settings for training the predictor?

#124 Raincleared-Song closed 10 months ago
1
./build/bin/main is not recognized as the name of a cmdlet, function, script file, or operable program

#123 onlyone-hyphen closed 9 months ago
2
provided PTX was compiled with an unsupported toolchain

#122 thangld201 opened 10 months ago
1
Benchmarks

#121 hasanar1f opened 10 months ago
2
where can I download the predictor of Relu-Falcon-40B (float16)?

#120 chenglimin opened 10 months ago
7
No CUDA toolset found

#119 c469591 opened 10 months ago
5
Irrelevant replies to prompts - LLama or PowerInfer issue?

#118 bluusun opened 10 months ago
1
what is the test condition of Figure 18 in your paper?

#117 chenglimin opened 10 months ago
12
How to obtain 'predictor weights'?

#116 harikrishnaapc opened 10 months ago
4
will it run on windows ?

#115 Sandy4321 closed 10 months ago
2
Support CPU/GPU inference on Windows

#114 hodlen closed 10 months ago
0
Fix argument parsing in examples/batched

#113 hodlen closed 10 months ago
0
Support broader python and pip versions

#112 hodlen closed 10 months ago
0
Can the MLP-predictor predicts all the neurons?

#111 curiousNick1 opened 10 months ago
2
GPU is not used after model is loaded

#110 mio-19 opened 10 months ago
4
How can I get the same answer any time?

#109 sunnyregion opened 10 months ago
1
CUDA error 13 at /home/PowerInfer/ggml-cuda.cu:9619: invalid device symbol

#108 zilunzhang opened 10 months ago
2
Windows Visual Studio编译失败

#107 dyt06 closed 10 months ago
1
Support setting VRAM budget for `examples/server`

#106 hodlen closed 10 months ago
0
谁能帮我解释一下推理结果的中文意思吗

#105 Gengchunsheng opened 10 months ago
1
是否考虑支持codellama

#104 littlebai3618 opened 10 months ago
1
falcon40B模型推理结果不可读

#103 jqliu42 opened 10 months ago
1
01-ai的Yi模型系列可以适配吗，我看模型结构是跟llama一样的

#102 felixstander closed 10 months ago
1
./build/bin/main -m /PATH/TO/MODEL -n $output_token_count -t $thread_num -p $prompt '.' 不是内部或外部命令，也不是可运行的程序

#101 18635191739 closed 10 months ago
2
对稠密激活Llama模型的兼容性问题 Compatibility issue with densely activated Llama models

#100 1562668477 opened 10 months ago
6
Fix generation error under INT4 quantization and batched prompting

#99 hodlen closed 10 months ago
0
Further optimisation of hybrid inference

#98 hodlen opened 10 months ago
0
Optimize CUDA sparse operator with Tensor Core

#97 hodlen opened 10 months ago
0
Kernel fusion to reduce communication overhead

#96 hodlen opened 10 months ago
0
Reclaim memory from offloaded model weights

#95 hodlen opened 10 months ago
1
How to convert llama family model to powerinfer.gguf?

#94 Mokuroh0924 closed 10 months ago
1
Meta: Wider model support for PowerInfer

#93 hodlen opened 10 months ago
10
Meta: Implementing hybrid inference across key desktop platforms

#92 hodlen opened 10 months ago
0
我也遇到了类似的问题，找不到stdatomic.h，不过我是在linux平台

#91 yinghuo302 closed 10 months ago
1
Update issue templates of PowerInfer

#90 hodlen closed 10 months ago
0
Add our Kanban to README.md

#89 hodlen closed 10 months ago
0
macOS/Metal inference support

#88 hodlen opened 10 months ago
0
WSL + CUDA issues

#87 hodlen opened 10 months ago
0
Windows CPU/GPU support

#86 hodlen closed 10 months ago
2
Fix offloading / VRAM budget bugs

#85 hodlen opened 10 months ago
2
请问original weight, predictor weights是怎么生成的？

#84 sunnyregion opened 10 months ago
2
Can we make it run on other models?

#83 YLSnowy opened 10 months ago
6
Converting GGUF Models and Support for Smaller Models

#82 nndnnv opened 10 months ago
1
didn't use gpu

#81 yuxx0218 closed 10 months ago
4
cmake -S . -B build -DLLAMA_CUBLAS=ON

#80 hungptit123 opened 10 months ago
1

Previous Next