Open hodlen opened 10 months ago
We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.
这个问题有解决吗?这边直接运行也看到gpu_offload未提前加载权重 第一步:报错没有activation文件夹;
这边手动增加activation文件夹(fake)后,执行python依然报错
After releasing online FFN offloading, we have found new issues in:
Some users also posted some errors per FFN offloading on social media that might need further investigate.