facebookresearch / theseus

A library for differentiable nonlinear optimization
MIT License
1.71k stars 121 forks source link

Derivatives w.r.t. the data in the quadratic fitting example #26

Open bamos opened 2 years ago

bamos commented 2 years ago

One interesting use-case of the derivatives Theseus provides in the quadratic fitting example in tutorial 1 is that they can also be used to obtain the derivative of the function w.r.t. the input data to do some form of sensitivity analysis to see how sensitive the loss/parameters are w.r.t. individual data points. We have a small example of this in section 6.1 of cvxpylayers for logistic regression: image

I quickly added these for the quadratic fitting example (da/dxy) to see what they would look like and we get something reasonable. The following shows that taking these negative gradient steps would decrease the quadratic parameter a. image

Do you think this would be interesting to include in one of the tutorials/examples? Maybe as a new section at the end of tutorial 2?

\cc @luisenp @mhmukadam @vshobha

Code

import torch
import theseus as th
import matplotlib.pyplot as plt

torch.manual_seed(0)

def generate_data(num_points=100, a=1, b=0.5, noise_factor=0.01):
    # Generate data: 100 points sampled from the quadratic curve listed above
    data_x = torch.rand((1, num_points))
    noise = torch.randn((1, num_points)) * noise_factor
    data_y = a * data_x.square() + b + noise
    return data_x, data_y

data_x, data_y = generate_data()

# data is of type Variable
x = th.Variable(data_x.requires_grad_(), name="x")
y = th.Variable(data_y.requires_grad_(), name="y")

# optimization variables are of type Vector with 1 degree of freedom (dof)
a = th.Vector(1, name="a")
b = th.Vector(1, name="b")

def quad_error_fn(optim_vars, aux_vars):
    a, b = optim_vars 
    x, y = aux_vars
    est = a.data * x.data.square() + b.data
    err = y.data - est
    return err

optim_vars = a, b
aux_vars = x, y
cost_function = th.AutoDiffCostFunction(
    optim_vars, quad_error_fn, 100, aux_vars=aux_vars, name="quadratic_cost_fn"
)
objective = th.Objective()
objective.add(cost_function)
optimizer = th.GaussNewton(
    objective,
    max_iterations=15,
    step_size=0.5,
)
theseus_optim = th.TheseusLayer(optimizer)

theseus_inputs = {
"a": 2 * torch.ones((1, 1)).requires_grad_(),
"b": torch.ones((1, 1)).requires_grad_()
}
aux_vars = {
"x": data_x,
"y": data_y,
}
updated_inputs, info = theseus_optim.forward(
    theseus_inputs, aux_vars=aux_vars,
    track_best_solution=True, verbose=True)
print("Best solution:", info.best_solution)

da_dx = torch.autograd.grad(
    updated_inputs['a'], aux_vars['x'],
    retain_graph=True)[0].squeeze()
da_dy = torch.autograd.grad(
    updated_inputs['a'], aux_vars['y'],
    retain_graph=True)[0].squeeze()

# Plot the leraned function
fig, ax = plt.subplots()
ax.scatter(data_x.detach(), data_y.detach());

a = info.best_solution['a'].squeeze().detach()
b = info.best_solution['b'].squeeze().detach()
x = torch.linspace(0., 1., steps=100)
y = a*x*x + b
ax.plot(x, y, color='k', lw=4, linestyle='--')

ax.set_xlabel('x')
ax.set_ylabel('y')

for i in range(data_x.shape[1]):
    data_xi = data_x[0,i].detach()
    data_yi = data_y[0,i].detach()
    ax.plot([data_xi, data_xi-da_dx[i]],
            [data_yi, data_yi-da_dy[i]],color='k')
ax.set_title('Negated derivatives of a w.r.t. the input data');
vshobha commented 2 years ago

I am not sure I understand fully. The goal of Tutorial 1 is to show how to use Theseus to solve an optimization problem. I'm not sure how the usage of Theseus is needed for the analysis you suggest adding?

bamos commented 2 years ago

I am not sure I understand fully. The goal of Tutorial 1 is to show how to use Theseus to solve an optimization problem. I'm not sure how the usage of Theseus is needed for the analysis you suggest adding?

I was thinking at the end of the derivative tutorial (the second one) since this is another demonstration of the derivatives that can be obtained with Theseus

vshobha commented 2 years ago

(Your description above says Tutorial 1. It also seems like you're using Tutorial 1 code.)

I still don't see how Theseus is necessary to obtain these derivatives? The torch documentation suggests the function can be used directly on any torch tensors? Does this somehow depend on the internals of the torch grad implementation and the Theseus forward implementation? If so, can you provide more details?

bamos commented 2 years ago

(Your description above says Tutorial 1. It also seems like you're using Tutorial 1 code.)

Ah it's the problem from Tutorial 1 that I added derivatives to, like Tutorial 2 uses

I still don't see how Theseus is necessary to obtain these derivatives? The torch documentation suggests the function can be used directly on any torch tensors? Does this somehow depend on the internals of the torch grad implementation and the Theseus forward implementation? If so, can you provide more details?

Right, it's because the Theseus forward computation can be seen as something that takes in the data x/y and optimizes over and returns the best a/b. Thus the grad call here to obtain da/dx is going backwards and unrolling through Theseus' inner optimization steps.

vshobha commented 2 years ago

But this can be done on any function with torch.autograd.grad, right? I am not still sure I understand how the Theseus computation specifically matters here.

bamos commented 2 years ago

But this can be done on any function with torch.autograd.grad, right? I am not still sure I understand how the Theseus computation specifically matters here.

Hmm, it would indeed be possible to solve the least squares problem without Theseus and obtain these derivatives in other ways, but to me, this demonstrates that the Theseus computations to solve the least squares problem indeed enable the derivatives that we say it provides, i.e. that Theseus finds the optimal variables and we can then differentiate through those w.r.t. other parameters defining the optimization problem

Also not every function can be nicely differentiated with torch.autograd.grad, e.g. if we solve the NLLS problem with one of torch's built-in optimizers that do in-place, non-differentiable operations, then using torch.autograd.grad through that solution (of a) doesn't give the right derivatives (w.r.t. x). Tangentially related, but in this case, higher is one other solution to this that helps adds these derivatives (but the goals/functionality of higher are much different than Theseus' focus on NLLS problems/solvers)

vshobha commented 2 years ago

Hmm, it would indeed be possible to solve the least squares problem without Theseus and obtain these derivatives in other ways, but to me, this demonstrates that the Theseus computations to solve the least squares problem indeed enable the derivatives that we say it provides, i.e. that Theseus finds the optimal variables and we can then differentiate through those w.r.t. other parameters defining the optimization problem

So I am not sure @luisenp would want the users to use torch.autograd.grad directly on Theseus tensors; from our discussion, I believe the preference is to have the user go through the Theseus API only. (If we still want to demonstrate that Theseus computations are nicely differentiable, we should move it to Tutorial 2, since iirc we do not discuss differentiating through functions till then.)

mhmukadam commented 2 years ago

Thanks all for the discussion. @bamos How about starting this as a new short tutorial and then seeing if there is any overlap with others? Including visualization of gradients is also quite valuable.

demonstrates that the Theseus computations to solve the least squares problem indeed enable the derivatives that we say it provides

This can be posed (within the potentially new tutorial) as an explainer and sanity check and not something we recommend users to do in practice as @vshobha mentions, "... preference is to have the user go through the Theseus API only ...".

not every function can be nicely differentiated

This seems useful to demonstrate as a second point after the above. We can try to explore the connection with higher later but it maybe useful to reference it here.

bamos commented 2 years ago

Thanks all for the discussion. @bamos How about starting this as a new short tutorial and then seeing if there is any overlap with others? Including visualization of gradients is also quite valuable.

Yeah! I'll write up a quick new one and send it in soon

demonstrates that the Theseus computations to solve the least squares problem indeed enable the derivatives that we say it provides

This can be posed (within the potentially new tutorial) as an explainer and sanity check and not something we recommend users to do in practice as @vshobha mentions, "... preference is to have the user go through the Theseus API only ...".

Hmm, why should we discourage use-cases like this? If the user wants the gradients through the NLLS solve, calling torch.autograd.grad or .backward() on (something derived from) Theseus' output seems functional and reasonable (and the motion planning tutorial now is using .backward() here too)