potential bug in adadelta

yorkerlin commented 9 years ago

@bayerj It seems there is a bug in adadelta.py when momentum is used. The momentum correction can be applied to adadelta, rmrprop and others stochastic updates. The potential bug is at the 110-th line of adadelta.py

    def _iterate(self):
        for args, kwargs in self.args:
            step_m1 = self.step
            d = self.decay
            o = self.offset
            m = self.momentum
            step1 = step_m1 * m * self.step_rate
            self.wrt -= step1

            gradient = self.fprime(self.wrt, *args, **kwargs)

            self.gms = (d * self.gms) + (1 - d) * gradient ** 2
            step2 = sqrt(self.sms + o) / sqrt(self.gms + o) * gradient * self.step_rate
            self.wrt -= step2

            self.step = step1 + step2
            self.sms = (d * self.sms) + (1 - d) * self.step ** 2

            self.n_iter += 1

            yield {
                'n_iter': self.n_iter,
                'gradient': gradient,
                'args': args,
                'kwargs': kwargs,
            }

I think it should be step1 = step_m1 * m instead of step1 = step_m1 * m * self.step_rate. Correct me if I am wrong.

Note that in rmsprop.py, the 160-th line is

 step1 = step_m1 * self.momentum

, which is correct.

yorkerlin commented 9 years ago

What is more, I think the adadelta with momentum correction in http://climin.readthedocs.org/en/latest/adadelta.html, is not correct.

The formula about Δθ should be similar to v in http://climin.readthedocs.org/en/latest/rmsprop.html

bayerj commented 9 years ago

You indeed found a bug there, thanks. I fixed it.

I could not find an error in the docs. Note that the use of \theta_{t + 1 \over 2} can be confusing, as it is described differently for rmsprop.

BRML / climin

potential bug in adadelta #29