Open XZWY opened 1 month ago
Hi, I'm not sure I understand the issue. Do you mean that the norm should be squared? If so, note that after normalizing the gradients with the gradient norm, both using the squared and not squared norm gives equivalent normalized gradients.
I see, thanks! then I assume in the paper it also should be normalizing by the gradient norm $$||G||_2$$ instead of the squared norm $$||G||_2^2$$ right? As in the image below
Oh yes, you are right. That looks like a typo.
Inside the sampler, the norm guide and error norm are all normalized, but they should all be the power of the error and the gradient, which seems to be mismatching the paper. Is this on purpose?