Closed ananyahjha93 closed 3 weeks ago
Need to add a test to check DDP works well with checkpointing.
@epwalsh done!
at the 300M size, it follows the FSDP model loss curve, being better initially, but both lines showing the same trend after sometime
Need to add a test to check DDP works well with checkpointing.