ai-computing / aicomp

Other
6 stars 0 forks source link

Handling OOM during the optimizer.step() phase #18

Open ememos opened 2 months ago

ememos commented 2 months ago

An OOM error might occur in GPU memory during the execution of optimizer.step()

ememos commented 1 month ago

The model_offload option added. With this option, during the optimizer.step() interval, parameters, gradients, and optimizer states are all offloaded to the CPU, and the optimizer step is performed on the CPU.