Self Loop Leads to Unusual Error in GraphOfConvexSets

cohnt commented 2 years ago

What happened?

It's not clear whether or not the underlying graph structure in GraphOfConvexSets should allow self loops. Although the theoretical formulation allows for self-loops, they won't affect the outcome of the problem. However, adding a self loop in the drake formulation will throw an error when one tries to solve the problem (but not when one actually defines the edge, which in turn leads to the problem). Also, the error message doesn't state that the cause of the problem is the self loop, it simply states, Failure at solvers/gurobi_solver.cc:1095 in DoSolve(): condition 'num_gurobi_linear_constraints == num_gurobi_linear_constraints_expected' failed.

I think any of the following three changes might help with this situation:

When an edge is added to the graph, check that it's not a self loop (and throw an error if it is).
Check for self loops when setting up the optimization problem, and throw an error if any are present.
Modify GraphOfConvexSets to be able to handle self loops in the optimization problem.

Version

45971d6c0c

What operating system are you using?

Ubuntu 22.04

What installation option are you using?

compiled from source code using Bazel

Relevant log output

# The following program will successfully solve a GCS problem.
# Un-commenting the line containing `#[left, left]` in the definition of `E` will cause it to fail with the above error message.

import numpy as np

from pydrake.all import HPolyhedron, Point, GraphOfConvexSets, GraphOfConvexSetsOptions, L2NormCost, Binding

def graph_of_convex_sets_example(convex_relaxation = False):
    b = np.array([-1, -1, 4])
    A_left = np.array([[1,1], [1, -1], [-1, 0]])
    A_right = np.array([[-1,-1], [-1, 1], [1, 0]])
    A_top = np.array([[-1,-1], [1, -1], [0, 1]])
    A_bottom = np.array([[1,1], [-1, 1], [0, -1]])

    H_left = HPolyhedron(A_left, b)
    H_right = HPolyhedron(A_right, b)
    H_top = HPolyhedron(A_top, b)
    H_bottom = HPolyhedron(A_bottom, b)

    G = GraphOfConvexSets()
    source = G.AddVertex(Point([0.2, -2]), "source")
    left =  G.AddVertex(H_left, "left")
    right = G.AddVertex(H_right, "right")
    top =     G.AddVertex(H_top, "top")
    bottom =    G.AddVertex(H_bottom, "bottom")
    target = G.AddVertex(Point([0, 2]), "target")

    E = [
        #[left, left],
        [source, left],
        [source, bottom],
        [source, right],
        [bottom, left],
        [bottom, right],
        [left, top],
        [right, top],
        [left, target],
        [top, target],
        [right, target],
    ]

    cost = L2NormCost(A=np.block([np.eye(2), -np.eye(2)]), b=np.zeros((2,1)))
    for u,v in E:
        e = G.AddEdge(u, v)
        e.AddCost(Binding[L2NormCost](cost, np.concatenate([e.xu(),
                                                            e.xv()])))

    options = GraphOfConvexSetsOptions()
    options.convex_relaxation = convex_relaxation
    result = G.SolveShortestPath(source, target, options)

    print(result.is_success())

graph_of_convex_sets_example()

mpetersen94 commented 2 years ago

Thanks for finding this! Of the solutions listed, I'm partial to 2 or 3 given that GraphOfConvexSets could be extended to support other types of graph queries besides shortest path (i.e. Traveling Salesman Problem) and there may be cases where a self loop is useful so I don't want to prevent adding them.

For solution 2 vs 3, since adding a self edge will not make the path any shorter (because all costs are positive), they boil down to either detecting an edge is a self loop and throwing an error, or just ignoring the self loop edge and not adding any of the associated variables to the actual optimization. I lean towards solution 3 because a self loop is not inherently an error in the Shortest Path formulation but would love others inputs. @TobiaMarcucci @Bernhardpg

If we do decide to ignore the self loop in the optimization, we could still log a warning so the user know that the self edge exists.

mpetersen94 commented 2 years ago

Interesting note as I start investigating this, only Gurobi seems unhappy about the self-loop. I wrote a simple test:

  GraphOfConvexSets spp;

  Vertex* source = spp.AddVertex(Point(Vector1d(-1)));
  Vertex* v1 = spp.AddVertex(Point(Vector1d(0)));
  Vertex* target = spp.AddVertex(Point(Vector1d(1)));

  spp.AddEdge(*source, *v1);
  Edge* loop = spp.AddEdge(*v1, *v1);
  spp.AddEdge(*v1, *target);

  auto result = spp.SolveShortestPath(*source, *target);
  ASSERT_TRUE(result.is_success());
  EXPECT_EQ(result.GetSolution(loop->phi()), 0);

which fails to solve when Gurobi is enabled but works when Gurobit is disabled (regardless of whether SNOPT or MOSEK are enabled/disabled). Nonetheless, taking self-loops out of the optimization can only accelerate the solution time as it reduces the number of variables to consider.

TobiaMarcucci commented 2 years ago

I think that either throwing an error or a warning when a self-edge is added is good. On a similar note, I think that it'd be helpful to have an helper function to duplicate sets in case someone would like to allow multiple steps in the same convex set.

bernhardpg commented 2 years ago

Similarly to Mark I am also able to solve the provided optimization problem example with the self-edge with Mosek.

My thoughts:

The edge costs for GCS are allowed to be nonnegative. I.e. if the edge length for a self-edge (u,u) is zero, the solver may very well pick the path with the self-edge as part of the optimal path. So I don't think that we can rely on the solver to not include the self-edge in the solution. Either way, I think it is cleaner to detect a self-edge rather than ignoring it.
What exactly does a self-edge mean here? Even if the current GCS formulation allows for self-edges, there is only one set of decision variables associated with the vertex (as we are still talking about only one vertex). For a self-edge to have any use, I would presume we also want two sets of decision variables, one for each side of the edge. Otherwise I don't see why there is any point in leaving the vertex through the self-edge in the first place.
Considering point 1. and 2., if the solver does choose to travel along a self-edge, this means that the edge cost is 0 and that any constraints that couple the edge endpoints can only require the decision variables to be equal. This does not add any expressive power to the problem formulation.

In general, I think that allowing self-loops and multiple visits to the vertices is something we should make easier to do in Drake's implementation of GCS, and hence second Tobia's opinion on helper functions.

However, for my reasons above, I think that adding self-loops with the current formulation is really not desirable as it does not really add any functionality. As it also is handled differently for different solvers, I think that until Drake has proper support for self-loops/multiple visits, it should throw an error in the case of a self-loop.

mpetersen94 commented 1 year ago

Ok. So I was able to find the underlying reason that Gurobi fails in this scenario. Apparently Gurobi does not like constraints where variables cancel (i.e. x + y - x == 1). If the constraint is passed symbolically, there is no need to worry because the parser removes the irrelevant variable. But if the constraint is passed as components (A matrix, b matrix, variables), the irrelevant variable is included causing Gurobi to barf. A simple reproduction of the problem can be found here. Not really sure what the right way to resolve this is. @hongkai-dai any thoughts on best way to address this bug?

RussTedrake commented 1 year ago

Wow. Nice find! That's a beautiful reproduction, which definitely seems like something we must address. Can you open a new issue with that test so that we can track it?

hongkai-dai commented 1 year ago

Thanks Mark! This should actually be a problem in MathematicalProgram. When we construct the binding, I should check that the bound variables are different. So this line of code

prog.AddLinearEqualityConstraint(Eigen::RowVector3d(-1, 1, 1), {x, y, x});

should throw an error since the bound variables duplicate x for twice.

RussTedrake commented 1 year ago

In an ideal world, mathematicalprogram would support this, not throw. otherwise every caller will effectively have to support it?

hongkai-dai commented 1 year ago

In an ideal world, mathematicalprogram would support this, not throw. otherwise every caller will effectively have to support it?

I am thinking to check the variable duplication when we construct Binding at https://github.com/RobotLocomotion/drake/blob/f09225fe9daf29390d371fa520465dce3112dc90/solvers/binding.h#L27-L31, that I can check any duplication in the variable v. So I only need to modify this one function.

The other alternative is to support duplicated variables in each binding, but then I will have to re-write code for every solver (SNOPT, MOSEK, Gurobi etc), since when I wrote SnoptSolver, MosekSolver etc, my mental model is that the variables in each binding are unique.

mpetersen94 commented 1 year ago

Moving discussion of the Gurobi bug to #18275

RussTedrake commented 1 year ago

@cohnt -- i think this issue is now resolved? I'll close it but please correct me if I'm wrong!

cohnt commented 1 year ago

The example in the original post in this thread still throws the same error for me.

(It's no longer immediately relevant, since I removed the faulty code that led to the self-loops, which is why I didn't notice it.)

mpetersen94 commented 1 year ago

Found the next issue: Gurobi barfs when you give a degenerate Lorentz Cone constraint. If you add a L2Norm cost on the loop edge when GCS converts it to a Lorentz Cone constraint, you get ℓ ≥ sqrt((x-x)² + (y-y)²) which Gurobi doesn't handle well. I've got a branch with a simple test that shows this failure on this branch. @hongkai-dai for what needs to change to address this.

hongkai-dai commented 1 year ago

Thanks a lot for detecting the problem and creating the unit test. I filed https://github.com/RobotLocomotion/drake/issues/18882 for the issue. Will work on it tomorrow.

mpetersen94 commented 1 year ago

The original problem that exposed this issue now passes as shown on this branch.

RobotLocomotion / drake