Closed renganxu closed 2 years ago
Summary: Add the parallel version (multi-tensor) of AdamW for omnivore workflow, to improve the step and zero_grad performance in optimizer.
Reviewed By: mannatsingh
Differential Revision: D35660803
This pull request was exported from Phabricator. Differential Revision: D35660803
Summary: Add the parallel version (multi-tensor) of AdamW for omnivore workflow, to improve the step and zero_grad performance in optimizer.
Reviewed By: mannatsingh
Differential Revision: D35660803