Refactored checkpoint loading to make is simpler. The code is still a mess, but will be more manageable for future work.
Functional changes:
Distributed checkpoint loading will determine automatically whether to use the fast (same format) or safe (different format) loading scheme. This means checkpoints will load correctly after a config change in mid-training, and in some case pretrained checkpoint loading will be faster.
Added some safety checks in checkpoint configs.
š Type of change
Select all that apply:
[ ] š Bug fix (non-breaking change that addresses a specific issue)
[x] š New feature (non-breaking change that adds functionality)
[ ] ā ļø Breaking change (a change that could affect existing functionality)
āØ Description
Refactored checkpoint loading to make is simpler. The code is still a mess, but will be more manageable for future work.
Functional changes:
š Type of change
Select all that apply: