SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.89k stars 405 forks source link

Fix offloading / VRAM budget bugs #85

Open hodlen opened 9 months ago

hodlen commented 9 months ago

After releasing online FFN offloading, we have found new issues in:

Some users also posted some errors per FFN offloading on social media that might need further investigate.

hodlen commented 8 months ago

We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.

qw1319 commented 3 months ago

这个问题有解决吗?这边直接运行也看到gpu_offload未提前加载权重 第一步:报错没有activation文件夹; image

这边手动增加activation文件夹(fake)后,执行python依然报错 image