Closed Quentin-Anthony closed 1 year ago
nsarka/mup-support
has my changes for this so far. I haven't tested it.
There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization
nsarka/mup-support
has my changes for this so far. I haven't tested it.There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization
- [ ] Add args for saving coord check plot
Great work! Thank you for this contribution ^_^
Don’t forget to add yourself as a library contributor in the readme as well 😉
Thanks Stella! I added myself as a contributor in the draft PR here https://github.com/EleutherAI/gpt-neox/pull/704 :)
Closed as completed by #704
We should add support for mutransfer: https://github.com/microsoft/mup
Appears non-trivial, but not as difficult as MoE. We'd have to modify the model itself. https://github.com/microsoft/mup/blob/main/examples/Transformer/model.py appears especially relevant. A good workflow would be:
gpt-neox/megatron/model/
to use mup. Probably mostly intransformer.py
gpt-neox/megatron/optimizers.py
gpt-neox/megatron/training.py
to allow previous features to be selected during training