ai-computing / aicomp

Other
6 stars 0 forks source link

Confusion Regarding optimizer_offload and mds_offload Options #19

Open ememos opened 1 month ago

ememos commented 1 month ago

The roles of the aforementioned options are confusing, and it would be beneficial to change them to more clearly defined meanings. The optimizer_offload option sends the optimizer state to the CPU when GPU memory runs low during the forward/backward passes, and swaps it back to the GPU during other phases. The model_offload option swaps parameters, gradients, and optimizer states to the CPU during the optimizer step() phase, and swaps parameters and gradients back to the GPU during the forward/backward passes for training.

ememos commented 1 month ago

The optimizer_offload option has been changed to the swap_opt_in_fwdbwd option. Additionally, the model_offload option has been changed to the swap_model_in_optstep option.