Add model parallel group to reduce scatter

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

https://www.eleuther.ai/

Apache License 2.0

6.96k stars 1.02k forks source link

Closed bclyang closed 2 months ago

bclyang commented 2 months ago

Fixes a bug with sequence parallel training that negatively affects convergence when the model parallel group is a subset of the available.