An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
6.96k
stars
1.02k
forks
source link
Add more informative checks for ZeRO incompatibility. #1275
Closed
AI-WAIFU closed 2 months ago
-Adds clear messages when trying to use ZeRO 2/3 pipeline or model parallel. -Update is_pipe_parallel comment