singular matrix when using torch.linalg.cholesky()

yimingli1998 commented 1 year ago

Hi, I am using theseus to solve a simple optimization problem and it raises an error:" linalg.cholesky: (Batch element 0): The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).."

The output solution is the same as the initialized solution. By checking the code, I find this is because the matrix AtA is singular. However, I don't know how to solve this problem. Here is my code:

import torch
import theseus as th

def cost_fn(optim_vars,aux_vars=None):
    energy = optim_vars[0].tensor.square().sum()
    return energy.unsqueeze(-1).unsqueeze(-1)

if __name__=="__main__":

    # define optimization variables
    N = 10
    optim_vars = th.Vector(N,name='params')
    # optimization
    cfn = th.AutoDiffCostFunction(
            [optim_vars], cost_fn, 1,  name="test"
            )
    objective = th.Objective()
    objective.add(cfn)
    optimizer = th.GaussNewton(
        objective,
        max_iterations=15,
        step_size=0.1,
        # linear_solver_kwargs={'check_singular': True},
    )

    theseus_optim = th.TheseusLayer(optimizer)
    theseus_inputs = {
        "params": torch.ones_like(optim_vars.tensor),
    }
    with torch.no_grad():
        updated_inputs, info = theseus_optim.forward(
            theseus_inputs, optimizer_kwargs={"track_best_solution": True, "verbose":True})
    print("Best solution:", info.best_solution)

Is there anything wrong with my code?

luisenp commented 1 year ago

HI @yimingli1998. Note that given cost functions cf_i, theseus automatically optimizes for

torch.sum(cf_0.error() ** 2 + ... + cf_N.error() ** 2), dim=1)

, so you don't need to square and sum as you are doing inside cost_fn(), which results in an underdetermined system (you have 1 equation with 10 unknows).

I think what you want is to simply return the N=10 dimension error defined as return optim_vars[0].tensor.clone().

BTW, this system is linear, so it will just converge in one iteration. You could use th.LinearOptimizer instead of th.GaussNewton. Is this what you are tryng to solve?

yimingli1998 commented 1 year ago

Thank you for your reply! Yes, I define the error as optim_vars[0].tensor.clone() and it works well. Now I can define more complex cost functions and use the Gauss-Newton algorithm for optimization.

gzspkgypt commented 5 months ago

@yimingli1998 Hello, I am also using the Gauss-Newton algorithm for optimization, and I have encountered the problem of non-positive definiteness. I would like to ask you how to write the cost function and how to ensure positive definiteness.

luisenp commented 5 months ago

Hi @gzspkgypt. The cost function is problem dependent, so we would need more information to help you write your cost function.

If you are observing non-positive definiteness you can try using Levenberg-Marquardt solver a set a non-zero damping.

gzspkgypt commented 5 months ago

@luisenp Thank you very much for your reply. The code below is a secondary optimization part that I used to edit Theseus, so that I could conveniently enter the code in the code. You can help me see what kind of error I should report.

import torch
import theseus as th

def Optimal(variables, point, state, bicycle_traj):
    device = torch.device('cuda:0')
    control_variables = th.Vector(dof=100, name="control_variables")
    ego_motion = th.Variable(torch.empty(1, 50, 4), name="ego_motion")
    current_state = th.Variable(torch.empty(1, 4), name="current_state")
    traj = th.Variable(torch.empty(1, 50, 4), name="traj")

    gains = {
            "smooth": 4.0,
            "reference_fn": 3.0
    }
    weights = {k: th.ScaleCostWeight(th.Variable(torch.tensor(v), name=f'gain_{k}')) for k, v in gains.items()}
    objective = th.Objective()
    objective = build_cost_function(objective, control_variables, ego_motion, current_state, traj, weights)
    optimizer = th.GaussNewton(objective,
                               th.CholeskyDenseSolver,
                               max_iterations=10,
                               step_size=0.3,
                               rel_err_tolerance=1e-3)
    layer = th.TheseusLayer(optimizer, vectorize=False)

    layer.to(device=device)
    planner_inputs = {
            "control_variables": variables.to(device=device, dtype=torch.float32),
            "ego_motion": point.to(device=device, dtype=torch.float32),
            "current_state": state.to(device=device, dtype=torch.float32),
            "traj": bicycle_traj.to(device=device, dtype=torch.float32),
    }

    with torch.no_grad():
        control, info = layer.forward(planner_inputs, optimizer_kwargs={'track_best_solution': True, "verbose": True})
        a = info.best_solution['control_variables'].tolist()[0]
        # a = control['control_variables'].tolist()[0]
        return a

def build_cost_function(objective, control_variables, ego_motion, current_state, traj, weights, vectorize=True):

    smooth_function = th.AutoDiffCostFunction(
        [control_variables], smooth_fn, 50, weights['smooth'],autograd_vectorize=vectorize,
        name="smooth"
    )
    objective.add(smooth_function)

    reference_x_fn = th.AutoDiffCostFunction([control_variables], ref_fn, 50, weights['reference_fn'],
                                          aux_vars=[ego_motion, current_state, traj], autograd_vectorize=vectorize,
                                          name="reference_x_fn")
    objective.add(reference_x_fn)
    return objective

def ref_fn(optim_vars, aux_vars):
    planning_point = optim_vars[0].tensor.view(-1, 50, 2)
    traj = aux_vars[2].tensor[:, :, :2]
    ref_error = torch.norm(planning_point - traj, dim=2)
    return ref_error

def smooth_fn(optim_vars, aux_vars):

    space_path_s = optim_vars[0].tensor.view(-1, 50, 2)[:, :, 0][0]     # x
    space_path_l = optim_vars[0].tensor.view(-1, 50, 2)[:, :, 1][0]     # y

    ds = torch.diff(space_path_s, n=1, dim=-1)
    dl = torch.diff(space_path_l, n=1, dim=-1)

    d2s = torch.diff(space_path_s, n=2)
    d2l = torch.diff(space_path_l, n=2)

    ds = ds[:-1]
    dl = dl[:-1]

    numerator = torch.abs(d2l * ds - dl * d2s)
    denominator = (ds ** 2 + dl ** 2) ** 1.5
    smooth_err = (numerator / denominator) * 5
    smooth_err = torch.unsqueeze(smooth_err, 0)

    rows = smooth_err.size(0)

    zeros = torch.zeros((rows, 2), dtype=smooth_err.dtype, device=smooth_err.device)

    smooth_err = torch.cat((zeros, smooth_err), dim=1)

    return smooth_err

if __name__ == '__main__':

    start = [0.0, 0.0, 0.0, 10.0]
    x = [0.0, 1.03, 2.09, 3.18, 4.297, 5.447, 6.627, 7.837, 9.077, 10.347, 11.647, 12.977, 14.337, 15.727, 17.113, 18.5, 19.886, 21.272, 22.659, 24.045, 25.431, 26.818, 28.204, 29.59, 30.977, 32.363, 33.749, 35.136, 36.522, 37.908, 39.295, 40.697, 42.099, 43.501, 44.903, 46.305, 47.707, 49.109, 50.511, 51.913, 53.315, 54.717, 56.119, 57.521, 58.923, 60.325, 61.727, 63.129, 64.531, 65.933, 67.331]

    y = [0.0, 0.0, 0.0, 0.0, 0.084, 0.084, 0.084, 0.084, 0.084, 0.084, 0.084, 0.084, 0.084, 0.084, -0.125, -0.334, -0.543, -0.752, -0.961, -1.17, -1.379, -1.588, -1.797, -2.006, -2.215, -2.424, -2.633, -2.842, -3.051, -3.26, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.469, -3.573]

    acc = [0.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.2, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

    theta = [0.0, 0.0, 0.0, 0.0, 0.075, -0.075, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.15, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.15, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.075]

    yaw = [0.0, 0.0, 0.0, 0.0, 0.075, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.15, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.0, -0.075]

    v = [10.0, 10.3, 10.6, 10.9, 11.2, 11.5, 11.8, 12.1, 12.4, 12.7, 13.0, 13.3, 13.6, 13.9, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0]

    traj = torch.tensor([[[ 0.0000,  0.0000,  0.0000, 10.0000],
         [ 1.0000,  0.0000,  0.0000, 10.0000],
         [ 2.0300,  0.0000,  0.0000, 10.3000],
         [ 3.0900,  0.0000,  0.0000, 10.6000],
         [ 4.1800,  0.0000,  0.0000, 10.9000],
         [ 5.2960,  0.0941,  0.0842, 11.2000],
         [ 6.4460,  0.0916,  6.2809, 11.5000],
         [ 7.6260,  0.0889,  6.2809, 11.8000],
         [ 8.8360,  0.0862,  6.2809, 12.1000],
         [10.0760,  0.0834,  6.2809, 12.4000],
         [11.3460,  0.0805,  6.2809, 12.7000],
         [12.6460,  0.0776,  6.2809, 13.0000],
         [13.9760,  0.0746,  6.2809, 13.3000],
         [15.3360,  0.0715,  6.2809, 13.6000],
         [16.7260,  0.0684,  6.2809, 13.9000],
         [18.0960, -0.2296,  6.0690, 14.0200],
         [19.4660, -0.5275,  6.0690, 14.0200],
         [20.8359, -0.8254,  6.0690, 14.0200],
         [22.2059, -1.1234,  6.0690, 14.0200],
         [23.5759, -1.4213,  6.0690, 14.0200],
         [24.9459, -1.7193,  6.0690, 14.0200],
         [26.3158, -2.0172,  6.0690, 14.0200],
         [27.6858, -2.3152,  6.0690, 14.0200],
         [29.0558, -2.6131,  6.0690, 14.0200],
         [30.4258, -2.9110,  6.0690, 14.0200],
         [31.7957, -3.2090,  6.0690, 14.0200],
         [33.1657, -3.5069,  6.0690, 14.0200],
         [34.5357, -3.5000,  6.0690, 14.0200],
         [35.9057, -3.5000,  6.0690, 14.0200],
         [37.2757, -3.5000,  6.0690, 14.0200],
         [38.6456, -3.5000,  6.0690, 14.0200],
         [40.0156, -3.5000,  6.0690, 14.0200],
         [41.4176, -3.5000,  6.2809, 14.0200],
         [42.8196, -3.5000,  6.2809, 14.0200],
         [44.2216, -3.5000,  6.2809, 14.0200],
         [45.6236, -3.5000,  6.2809, 14.0200],
         [47.0256, -3.5000,  6.2809, 14.0200],
         [48.4276, -3.5000,  6.2809, 14.0200],
         [49.8296, -3.5000,  6.2809, 14.0200],
         [51.2316, -3.5000,  6.2809, 14.0200],
         [52.6336, -3.5000,  6.2809, 14.0200],
         [54.0356, -3.5000,  6.2809, 14.0200],
         [55.4376, -3.5000,  6.2809, 14.0200],
         [56.8396, -3.5000,  6.2809, 14.0200],
         [58.2416, -3.5000,  6.2809, 14.0200],
         [59.6436, -3.5000,  6.2809, 14.0200],
         [61.0456, -3.5000,  6.2809, 14.0200],
         [62.4475, -3.5000,  6.2809, 14.0200],
         [63.8495, -3.5000,  6.2809, 14.0200],
         [65.2515, -3.5000,  6.2809, 14.0200]]])
    control_variables = torch.stack((torch.tensor(x[:50]), torch.tensor(y[:50])), dim=1).flatten()
    control_variables = control_variables.reshape(1, -1)

    current_state = torch.stack(
        (torch.tensor(acc[:50]), torch.tensor(theta[:50]), torch.tensor(yaw[:50]), torch.tensor(v[:50])),
        dim=1).unsqueeze(0)

    start_tensor = torch.tensor(start).unsqueeze(0)

    optimal_path = Optimal(control_variables, current_state, start_tensor, traj)

luisenp commented 5 months ago

Hi @gzspkgypt. I'm sorry, I don't fully understand what your last question is.

gzspkgypt commented 5 months ago

@luisenp This quadratic optimization code is done using Gaussian Newton Optimizer, but the problem of non-positive determination is reported, the code is able to run directly, please help me to see why the cost function I wrote is non-positive determination, thank you very much!

luisenp commented 4 months ago

@gzspkgypt Sorry for the delay, these days I've been too busy with other projects. I saw in another thread that you were able to solve this issue?

facebookresearch / theseus

singular matrix when using torch.linalg.cholesky() #619