Open hodlen opened 9 months ago
We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.
这个问题有解决吗?这边直接运行也看到gpu_offload未提前加载权重 第一步:报错没有activation文件夹;
这边手动增加activation文件夹(fake)后,执行python依然报错
After releasing online FFN offloading, we have found new issues in:
Some users also posted some errors per FFN offloading on social media that might need further investigate.