Open wangkuiyi opened 1 year ago
Thanks for your information, we will investigate it. Inplace updating is a classical engineering trick, and our intuition is to provide a solution for low-resource training. Our paper also discusses why SGD might be a good choice for LLM finetuning, how to stabilize the training process, and other analyses.
It is a classical idea to overlap the backward pass and the optimization step. PyTorch supports this overlapping in DDP and FSDP. For example, here are hooks in DDP https://github.com/pytorch/pytorch/tree/main/torch/distributed/algorithms/ddp_comm_hooks
How does this project (https://arxiv.org/pdf/2306.09782.pdf) differ from https://arxiv.org/pdf/2306.09782.pdf? Thanks.