Closed francktcheng closed 1 year ago
End-to-end performance on a single A100 GPU with one day of Taobao dataset and a batch size of 64000 obtains an improvement of training throughput around 1.3x.
End-to-end performance on a single A100 GPU with one day of Taobao dataset and a batch size of 64000 obtains an improvement of training throughput around 1.3x.