An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
6.96k
stars
1.02k
forks
source link
SFT improvements (labeling fixes, different packing implementations) #1240
Closed
dmahan93 closed 3 months ago