An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
6.95k
stars
1.02k
forks
source link
Remove the remaining two hanging wandb config fields #1287
Closed
Quentin-Anthony closed 2 months ago