OpenLMLab / LOMO

LOMO: LOw-Memory Optimization
MIT License
978 stars 68 forks source link

What is the difference from official PyTorch DDP hooks? #9

Open wangkuiyi opened 1 year ago

wangkuiyi commented 1 year ago

It is a classical idea to overlap the backward pass and the optimization step. PyTorch supports this overlapping in DDP and FSDP. For example, here are hooks in DDP https://github.com/pytorch/pytorch/tree/main/torch/distributed/algorithms/ddp_comm_hooks

How does this project (https://arxiv.org/pdf/2306.09782.pdf) differ from https://arxiv.org/pdf/2306.09782.pdf? Thanks.

QipengGuo commented 1 year ago

Thanks for your information, we will investigate it. Inplace updating is a classical engineering trick, and our intuition is to provide a solution for low-resource training. Our paper also discusses why SGD might be a good choice for LLM finetuning, how to stabilize the training process, and other analyses.