Closed wenouyang closed 6 years ago
Following paper discusses some aspects of the model (mainly, mixed precision float training): https://dl.acm.org/citation.cfm?id=3146358
It also discusses the distributed training algorithm, learning rate scheduling, and the neural network architecture.
We have another paper which is now submitted to a journal, it will become available soon.
Hi, thanks for sharing the code, Are there any research paper discussing this model?
Thanks.