KaiyangZhou / pytorch-center-loss

Pytorch implementation of Center Loss
MIT License
965 stars 218 forks source link

About updating centers #2

Closed luzai closed 6 years ago

luzai commented 6 years ago

Hi~ May I ask whether the implementation in this repo will be equivalent to the original implementation in paper?

# by doing so, weight_cent would not impact on the learning of centers
for param in criterion_cent.parameters():
     param.grad.data *= (1. / args.weight_cent)

Simply let alpha=alpha/lambda, (alpha is lr_cent, lambda is weight_cent), will it equivalent to previous implementation?

It seems the author do not adopt gradient w.r.t. c_j, and instead, use the delta rule shown below.

image

image

KaiyangZhou commented 6 years ago

It is equivalent to the paper.

alpha=alpha/lambda looks correct to me.

luzai commented 6 years ago

Thanks a lot, it seems alpha=alpha/lambda make learning rate greater than 1 and make the models unstable. I need to do more experiments.

kikyou123 commented 6 years ago

@luzai Do you fix the problem? It seems that the gradient w.r.t. c_j is not equivalent the delta rule shown below image If I should write the backward function by myself?

luzai commented 6 years ago

Since in a mini-batch some centers may occur more frequently than the others, I guess the author of center loss aims to average the gradient by the number of centers in a mini-batch.

I have not written the backward function to normalize the gradient yet, because by tuning learning-rate and alpha, the code provided by KaiyangZhou achieves reasonable performance.

We may have a try and compare the performance. https://pytorch.org/tutorials/beginner/examples_autograd/two_layer_net_custom_function.html

vanpersie32 commented 5 years ago

@luzai yeah,the sum of gradients should be normalized by the number of example belonging to the center in the mini-batch