Summary:
I'm using recmetrics from torchrec to log supervised learning metrics. Metrics logged:
MSE
MAE
Calibration
New metrics can easily be added by updating the model_utils.py file
The metrics are logged every 100 steps by default (controlled by config.trainer.log_every_n_steps). If this parameter is set too small, training will be slowed down by frequent matrix inversions.
The default window size is equal to batch_size*world_size, so the window is just 1 step.
An unfortunate side effect of per-ts discounting is that each epoch (ts partition) has a separate curve in TB.
Summary: I'm using recmetrics from torchrec to log supervised learning metrics. Metrics logged:
New metrics can easily be added by updating the model_utils.py file
The metrics are logged every 100 steps by default (controlled by
config.trainer.log_every_n_steps
). If this parameter is set too small, training will be slowed down by frequent matrix inversions.The default window size is equal to batch_size*world_size, so the window is just 1 step.
An unfortunate side effect of per-ts discounting is that each epoch (ts partition) has a separate curve in TB.
Differential Revision: D42971642