Closed luzai closed 6 years ago
It is equivalent to the paper.
alpha=alpha/lambda
looks correct to me.
Thanks a lot, it seems alpha=alpha/lambda
make learning rate greater than 1 and make the models unstable. I need to do more experiments.
@luzai Do you fix the problem? It seems that the gradient w.r.t. c_j is not equivalent the delta rule shown below If I should write the backward function by myself?
Since in a mini-batch some centers may occur more frequently than the others, I guess the author of center loss aims to average the gradient by the number of centers in a mini-batch.
I have not written the backward function to normalize the gradient yet, because by tuning learning-rate and alpha, the code provided by KaiyangZhou achieves reasonable performance.
We may have a try and compare the performance. https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html
@luzai yeah,the sum of gradients should be normalized by the number of example belonging to the center in the mini-batch
Hi~ May I ask whether the implementation in this repo will be equivalent to the original implementation in paper?
Simply let
alpha=alpha/lambda
, (alpha is lr_cent, lambda is weight_cent), will it equivalent to previous implementation?It seems the author do not adopt gradient w.r.t. c_j, and instead, use the delta rule shown below.