google-research / sam

Apache License 2.0
552 stars 70 forks source link

Implementation details of grad_norm computation #13

Open shuo-ouyang opened 3 years ago

shuo-ouyang commented 3 years ago

Hi guys, I am trying to implement SAM in MXNet and encounter two questions about grad_norm computation. When we compute SAM gradient e_w, should we calculate grad norm for each parameter, or just calculate the grad norm for all parameters at once? Hope for your advice.