Liuhong99 / Sophia

The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
MIT License
937 stars 54 forks source link

Incomplete WandB logging #6

Closed nalzok closed 1 year ago

nalzok commented 1 year ago

For example, the following call to wandb.log does not include the "mfu": running_mfu*100 entry. The same goes for train_adam.py.

https://github.com/Liuhong99/Sophia/blob/16378330b6ef50203cab61f10b10eb42301f271b/train_sophiag.py#L337-L348

Consequently, mfu is calculated and printed for each step, but the value is only logged to Wandb every 1000 steps. Is that intentional?

Liuhong99 commented 1 year ago

mfu in the training script output (every step) is to monitor whether the training is starting properly. mfu in the wandb (every 1000 steps) is not very useful. Actually the mfu should remain steady after the initial stage. Computing mfu has negligible cost, so I believe this is fine.