Closed nalzok closed 1 year ago
mfu in the training script output (every step) is to monitor whether the training is starting properly. mfu in the wandb (every 1000 steps) is not very useful. Actually the mfu should remain steady after the initial stage. Computing mfu has negligible cost, so I believe this is fine.
For example, the following call to
wandb.log
does not include the"mfu": running_mfu*100
entry. The same goes fortrain_adam.py
.https://github.com/Liuhong99/Sophia/blob/16378330b6ef50203cab61f10b10eb42301f271b/train_sophiag.py#L337-L348
Consequently,
mfu
is calculated and printed for each step, but the value is only logged to Wandb every 1000 steps. Is that intentional?