This MR fixes the Mixed Precision LAMB optimizer: note that param_groups is not set up before the optimizer module init was called. Therefore I swapped the order of obtaining device info and the super module init around. I tested it and it seems to work. This is critical for MLPerf HPC, please review and merge asap.
This MR fixes the Mixed Precision LAMB optimizer: note that param_groups is not set up before the optimizer module init was called. Therefore I swapped the order of obtaining device info and the super module init around. I tested it and it seems to work. This is critical for MLPerf HPC, please review and merge asap.