adjustable gradient clipping ("clip_gradients / current_lr")

BVLC / caffe

Caffe: a fast open framework for deep learning.

http://caffe.berkeleyvision.org/

Other

34.03k stars 18.7k forks source link

adjustable gradient clipping ("clip_gradients / current_lr") #4671

Open madshi opened 8 years ago

madshi commented 8 years ago

This paper:

https://arxiv.org/pdf/1511.04587.pdf

... suggests "adjustable gradient clipping", which seems to greatly help in training deep networks quickly and efficiently. Basically they suggest to scale the gradient clipping with "1 / learning rate". So they want to clip the gradients to [- clip_gradients / current_lr, + clip_gradients / current_lr].

As far as I can see, Caffe doesn't support this yet, correct? Might be useful to add?

shelhamer commented 7 years ago

Caffe does not do this, and clips the gradient itself: https://github.com/BVLC/caffe/blob/master/src/caffe/solvers/sgd_solver.cpp#L101-L116

The proposed method is closer to update clipping since it incorporates the learning rate. This could be incorporated into Caffe by adding a new flag and passing rate to ClipGradients(). I'd like to see further empirical demonstration that this is useful to do first.