Add support for gradient norm logging

AnswerDotAI / bert24

Apache License 2.0

66 stars 4 forks source link

Add support for gradient norm logging #108

Closed warner-benjamin closed 2 months ago

warner-benjamin commented 2 months ago

This PR adds support for logging the L1 and L2 gradient norms into StableAdamW, following the PyToch clip_grad_norm_ calculation method. It appears to slow down training by 1% at most.

warner-benjamin commented 2 months ago

Checking in code from the server attributed it to @staghado 🙂

rbiswasfc commented 2 months ago

gradient norm logging code looks good to me!

warner-benjamin commented 2 months ago

Merging since it's training without issue.