bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k stars 213 forks source link

Encoding checkpoint reshaping guide #349

Open tjruwase opened 1 year ago

tjruwase commented 1 year ago

This PR is a step towards generalizing the universal checkpointing approach that enables arbitrary reshapes of 3D parallel checkpoints. This PR eliminated the hardcoding of BLOOM model architecture in the current implementation as follow:

  1. Client encodes any shape information required for extracting and merging tensor slices (e.g., slices to be averaged rather than concatenated). This information is included in the checkpoint file.
  2. Replace constant strings with symbolic constants defined by deeepspeed library.

Requires the companion DS PR.

tjruwase commented 1 year ago

@stas00, I don't intend for this to be merged. Rather, I am sharing this PR to get your feedback for the generalization effort. As discussed earlier, the core logic will eventually move to DS.