Closed kshitijkg closed 1 year ago
Current Solution: Number 3. Renames the weights from attention to attention.attn_block and mlp to mlp.attn_block, and stores the checkpoint again, and use the new checkpoint. PR: https://github.com/floatingsnake/gpt-neox/pull/10
We just need to run the convert checkpoint script and use that to load.
Additionally, we set strict = False so that image prefix and adapters are ignored. I have checked manually if there are any other weights that exist that dont have the right name, but everything looks correct.
Lastly, this requires another change in the DeeperSpeed code, use the following branch: https://github.com/EleutherAI/DeeperSpeed/tree/robin_summit
We need to load Pythia Checkpoints for MAGMA training. Main Issue: Mismatch in weights in checkpoint and in MAGMA model Sources of mismatch
Proposed solutions: Without changing names on Pythia Checkpoint:
Changing the names of the Pythia Checkpoint:
Mismatch Source 2: