High Order Optimizers - Githubissues

udemirezen commented 2 years ago

Hi,

Do you support BFGS or LBFGS optimization? How can I use L-BGFS optimizer after a certain epochs of ADAM optimization? Thank you

shuheng-liu commented 2 years ago

Hi @udemirezen

Yes, and you can do more than that (make sure you are using the latest version of neurodiffeq)! See the following example.

First, instantiate a solver like you always do. The solver will default to an Adam optimizer

import numpy as np
import torch
from neurodiffeq import diff
from neurodiffeq.solvers import Solver1D
from neurodiffeq.generators import Generator1D
from neurodiffeq.conditions import IVP

solver = Solver1D(
    ode_system=lambda u, t: [diff(u, t, order=2) + u],  # define ODE: u'' + u = 0
    conditions=[IVP(0, 1, 0)], # define initial condition u(0) = 1 amd u'(0) = 0
    t_min=0.0, # optional if setting both train_generator and valid_generator
    t_max=2*np.pi,  # optional if setting both train_generator and valid_generator
    train_generator=Generator1D(1000, 0.0, 2*np.pi),
    valid_generator=Generator1D(1000, 0.0, 2*np.pi),
)

Then, set a callback, which will be called at global epoch 1000

from neurodiffeq.callbacks import SetOptimizer, ClosedIntervalGlobal

# instantiate a callback that sets the optimizer (you can pass in optional OPTIMIZER kwargs)
cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
# the callback will be called only once at global epoch No. 1000
cb_with_condition = cb.conditioned_on(ClosedIntervalGlobal(min=1000, max=1000))

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

You can actually dynamically change the loss function (aka criterion) as well,

from neurodiffeq.callbacks import SetCriterion

You can also customize when to change the optimizer/criterion, e.g., if you want to change the optimizer only when your training loss converges (delta < 1e-5) for 20 consecutive epochs

from neurodiffeq.callbacks import RepeatedMetricConverge
cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
cb_with_condition = cb.conditioned_on(RepeatedMetricConverge(epsilon=1e-5, repetition=20, metric='loss', use_train=True))

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

You can even use &, |, and ~ to chain these conditions, e.g.,

cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
cond1 = RepeatedMetricConverge(epsilon=1e-5, repetition=20, metric='loss')
cond2 = ClosedIntervalGlobal(min=1000, max=None)
cb_with_condition = cb.conditioned_on(cond1 & cond2)

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

udemirezen commented 2 years ago

Hi @udemirezen

Yes, and you can do more than that (make sure you are using the latest version of neurodiffeq)! See the following example.

First, instantiate a solver like you always do. The solver will default to an Adam optimizer

import numpy as np
import torch
from neurodiffeq import diff
from neurodiffeq.solvers import Solver1D
from neurodiffeq.generators import Generator1D
from neurodiffeq.conditions import IVP

solver = Solver1D(
    ode_system=lambda u, t: [diff(u, t, order=2) + u],  # define ODE: u'' + u = 0
    conditions=[IVP(0, 1, 0)], # define initial condition u(0) = 1 amd u'(0) = 0
    t_min=0.0, # optional if setting both train_generator and valid_generator
    t_max=2*np.pi,  # optional if setting both train_generator and valid_generator
    train_generator=Generator1D(1000, 0.0, 2*np.pi),
    valid_generator=Generator1D(1000, 0.0, 2*np.pi),
)

Then, set a callback, which will be called at global epoch 1000

from neurodiffeq.callbacks import SetOptimizer, ClosedIntervalGlobal

# instantiate a callback that sets the optimizer (you can pass in optional OPTIMIZER kwargs)
cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
# the callback will be called only once at global epoch No. 1000
cb_with_condition = cb.conditioned_on(ClosedIntervalGlobal(min=1000, max=1000))

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

You can actually dynamically change the loss function (aka criterion) as well,

from neurodiffeq.callbacks import SetCriterion

You can also customize when to change the optimizer/criterion, e.g., if you want to change the optimizer only when your training loss converges (delta < 1e-5) for 20 consecutive epochs

from neurodiffeq.callbacks import RepeatedMetricConverge
cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
cb_with_condition = cb.conditioned_on(RepeatedMetricConverge(epsilon=1e-5, repetition=20, metric='loss', use_train=True))

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

You can even use &, |, and ~ to chain these conditions, e.g.,

cb = SetOptimizer(torch.optim.LBFGS, optimizer_kwargs={'lr': 1e-3, })
cond1 = RepeatedMetricConverge(epsilon=1e-5, repetition=20, metric='loss')
cond2 = ClosedIntervalGlobal(min=1000, max=None)
cb_with_condition = cb.conditioned_on(cond1 & cond2)

solver.fit(max_epochs=5000, callbacks=[cb_with_condition])

woww. thank you for these good explanations.. :) It is very informative thanks.

udemirezen commented 2 years ago

Hi, according to torch.optim.LBFGS documentation (https://pytorch.org/docs/stable/generated/torch.optim.LBFGS.html), optimizer requires a closure function.

optimizer.step(closure) Some optimization algorithms such as Conjugate Gradient and LBFGS need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients, compute the loss, and return it.

Example: ` for input, target in dataset:

def closure():
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    loss.backward()
    return loss

optimizer.step(closure)

`

Do I have to provide any closure function with your framework if i want to use LBFGS optimizers? If yes, what is the best way of doing it.? thank you again.

shuheng-liu commented 2 years ago

Hi, you don't have to pass closure to L-BFGS, neurodiffeq handles that automatically. See #93 for more details.

There's a little caveat. Basically, with L-BFGS, you cannot set n_batches_train larger than 1 in your Solver. If you do (for example set n_batches_train=4), it will be identical to training with more 4x more epochs.

This is unlike other optimizers, (e.g. Adam). If you set n_batches_train=4 and use Adam, it will be like training on a 4x larger batch, which saves 3/4 GPU memory but takes 4x longer time to run.

NeuroDiffGym / neurodiffeq

High Order Optimizers #159