Closed bclyang closed 2 months ago
Fixes a bug with sequence parallel training that negatively affects convergence when the model parallel group is a subset of the available.
Fixes a bug with sequence parallel training that negatively affects convergence when the model parallel group is a subset of the available.