EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.8k stars 985 forks source link

Add support for mutransfer #679

Closed Quentin-Anthony closed 1 year ago

Quentin-Anthony commented 1 year ago

We should add support for mutransfer: https://github.com/microsoft/mup

Appears non-trivial, but not as difficult as MoE. We'd have to modify the model itself. https://github.com/microsoft/mup/blob/main/examples/Transformer/model.py appears especially relevant. A good workflow would be:

nsarka commented 1 year ago

nsarka/mup-support has my changes for this so far. I haven't tested it.

There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization

StellaAthena commented 1 year ago

nsarka/mup-support has my changes for this so far. I haven't tested it.

There's one more thing to add to this list. Mup can generate a plot that's helpful for checking the correctness of the implementation. https://github.com/microsoft/mup#checking-correctness-of-parametrization

  • [ ] Add args for saving coord check plot

Great work! Thank you for this contribution ^_^

Don’t forget to add yourself as a library contributor in the readme as well 😉

nsarka commented 1 year ago

Thanks Stella! I added myself as a contributor in the draft PR here https://github.com/EleutherAI/gpt-neox/pull/704 :)

StellaAthena commented 1 year ago

Closed as completed by #704