WuJie1010 / Facial-Expression-Recognition.Pytorch

A CNN based pytorch implementation on facial expression recognition (FER2013 and CK+), achieving 73.112% (state-of-the-art) in FER2013 and 94.64% in CK+ dataset
MIT License
1.77k stars 546 forks source link

clip_gradient #107

Open HCookY95 opened 3 years ago

HCookY95 commented 3 years ago

@WuJie1010 Hi Wujie,

I'm kind of confused with clip_gradient which is this sentence: "clip_gradient(optimizer, 0.1)" I only know the gradient is for regularization, but don't know the details, that's also what my second question, which parameter in the network is applied the value 0.1? What does clip gradient really do, and how to choose the threshold value?

Thanks a lot!

AKASHKRISHNABHUKYA commented 1 year ago

@WuJie1010 Hi Wujie,

I'm kind of confused with clip_gradient which is this sentence: "clip_gradient(optimizer, 0.1)" I only know the gradient is for regularization, but don't know the details, that's also what my second question, which parameter in the network is applied the value 0.1? What does clip gradient really do, and how to choose the threshold value?

Thanks a lot!

"clip_gradient" is a function in deep learning that is used to limit the magnitude of the gradients during the backpropagation process. The purpose of this is to prevent the gradients from becoming too large or exploding, which can cause problems during optimization and lead to a failure to converge.

In the code snippet you provided, "clip_gradient(optimizer, 0.1)", the function is being applied to the optimizer, and the threshold value is set to 0.1. The threshold value determines the maximum magnitude of the gradients that are allowed to be backpropagated through the network. If the magnitude of a gradient exceeds the threshold, it will be clipped, or scaled down, so that it falls within the threshold.

As for how to choose the threshold value, it is often determined through trial and error. A common starting point is to set the threshold value to a small value, such as 0.1, and then adjust it based on the results obtained during training. If the gradients are exploding, the threshold can be increased to reduce their magnitude. If the optimization is not making progress, the threshold can be reduced to allow larger gradients to be backpropagated.

It is also important to note that while gradient clipping can be a useful technique, it should be used with caution, as it can also lead to suboptimal optimization if the threshold is set too high.