cnellington / Contextualized

An SKLearn-style toolbox for estimating and analyzing models, distributions, and functions with context-specific parameters.
http://contextualized.ml/
GNU General Public License v3.0
67 stars 12 forks source link

Add d-adaptation to ignore learning rate tuning #212

Closed cnellington closed 1 year ago

cnellington commented 1 year ago

https://openreview.net/forum?id=GXZ6cT5cvY https://github.com/facebookresearch/dadaptation

kennethZhangML commented 1 year ago

We could do something like this:

import numpy as np

def d_adaptation(x0, f, grad_f, D0, n_iter):
    x = x0
    D = D0
    s = np.zeros_like(x)
    for t in range(n_iter):
        g = grad_f(x)
        s = s + g ** 2
        eta = np.sqrt(D / (s + 1e-8))
        x = x - eta * g
        D = max(D, np.sum(s) / (t + 1))
    return x

we will also need to define an objective function f and its gradient grad_f, and an initial point x0, and initial lower bound on the Lipschitz constant D0, and the number of iterations n_iter. We could do something similar to such:

def f(x):
    return np.sum(x ** 2)

def grad_f(x):
    return 2 * x

x0 = np.zeros(10)
D0 = 1
n_iter = 1000

x = d_adaptation(x0, f, grad_f, D0, n_iter)

This would minimize the function f using D-Adaptation, starting at initial point x0, with initial lower bound on the Lipschitz constant of D0, and running for 1000 iterations. The final point would be returned in the variable x.

cnellington commented 1 year ago

Hi @kennethZhangML, thanks for taking a look. Our package is built on PyTorch so we'd like to use d-adaptation through a PyTorch optimizer class. It looks like this is already implemented in their codebase (linked above) as well as their newer version Prodigy. If it works nicely, we'll just want to update the dependencies and add some new kwargs to enable it as an option.

cnellington commented 1 year ago

Tests perform worse after implementing this, randomly not converging. Seems to be overestimating LR. Closing.