borglab / gtsam

GTSAM is a library of C++ classes that implement smoothing and mapping (SAM) in robotics and vision, using factor graphs and Bayes networks as the underlying computing paradigm rather than sparse matrices.
http://gtsam.org
Other
2.55k stars 754 forks source link

ISAM2 Marginalization Issues #1101

Open gradyrw opened 2 years ago

gradyrw commented 2 years ago

Description

I have been trying to use ISAM2's marginalizeLeaves function and encountered a few issues, particularly on larger problems. I've tracked the issues down as best I can and I believe there are 3 different bugs in the marginalizeLeaves function which are causing problems:

  1. There is a missing nullptr check which is required in cases where a leaf is also a root (e.g. marginalizing the only node in a graph).

  2. The vertical block matrix of the clique's conditional is not getting updated properly in cases where the entire clique is not marginalized

  3. The logic for splitting a clique assumes that keys are ordered in a way such that all the marginalized keys appear before any keys that are being kept. This is not always true.

I think there are simple solutions for the first two issues, but I'm not sure about the final one.

Steps to reproduce

I'm using this code to force the keys that I want to marginalize to be leaves and to then update ISAM2 and marginalize:

boost::optional<FastMap<Key, int> > createOrderingConstraints(const ISAM2& isam, const KeyVector& new_keys,
                                                              const KeySet& marginalizable_keys)
{
  if (marginalizable_keys.empty()) {
    return boost::none;
  } else {
    FastMap<Key, int> constrained_keys = FastMap<Key, int>();
    // Generate ordering constraints so that the marginalizable variables will be eliminated first
    // Set all existing and new variables to Group1
    for (const auto& [key, val] : isam.getDelta()) {
      constrained_keys.emplace(key, 1);
    }
    for (const auto& key : new_keys) {
      constrained_keys.emplace(key, 1);
    }
    // And then re-assign the marginalizable variables to Group0 so that they'll all be leaf nodes
    for (const auto& key : marginalizable_keys) {
      constrained_keys.at(key) = 0;
    }
    return constrained_keys;
  }
}

void markAffectedKeys(const Key& key, const ISAM2Clique::shared_ptr& root_clique, KeyList& additional_keys)
{
  std::stack<ISAM2Clique::shared_ptr> frontier;
  frontier.push(root_clique);
  // Basic DFS to find additional keys
  while (!frontier.empty()) {
    // Get the top of the stack
    const ISAM2Clique::shared_ptr clique = frontier.top();
    frontier.pop();
    // Check if we have more keys and children to add
    if (std::find(clique->conditional()->beginParents(), clique->conditional()->endParents(), key) !=
        clique->conditional()->endParents()) {
      for (Key i : clique->conditional()->frontals()) {
        additional_keys.push_back(i);
      }
      for (const ISAM2Clique::shared_ptr& child : clique->children) {
        frontier.push(child);
      }
    }
  }
}

ISAM2Result updateAndMarginalize(const NonlinearFactorGraph& new_factors, const Values& new_values,
                                 const KeySet& marginalizable_keys, ISAM2& isam)
{
  // Force ISAM2 to put marginalizable variables at the beginning
  const boost::optional<FastMap<Key, int> > ordering_constraints =
      createOrderingConstraints(isam, new_values.keys(), marginalizable_keys);

  // Mark additional keys between the marginalized keys and the leaves
  KeyList additional_marked_keys;
  for (Key key : marginalizable_keys) {
    ISAM2Clique::shared_ptr clique = isam[key];
    for (const ISAM2Clique::shared_ptr& child : clique->children) {
      markAffectedKeys(key, child, additional_marked_keys);
    }
  }

  // Update
  ISAM2Result result =
      isam.update(new_factors, new_values, FactorIndices{}, ordering_constraints, boost::none, additional_marked_keys);

  // And marginalize
  if (!marginalizable_keys.empty()) {
    FastList<Key> leaf_keys(marginalizable_keys.begin(), marginalizable_keys.end());
    isam.marginalizeLeaves(leaf_keys);
  }

  return result;
}

I've adapated this code from IncrementalFixedLagSmoother.cpp and it appears to work fine (I haven't encountered any errors from trying to marginalize non-leaf nodes). Next I've got the following helpers for creating an ISAM2 object and defining noise models for factors:

static const boost::shared_ptr<noiseModel::Isotropic> NOISE = noiseModel::Isotropic::Sigma(6, 1.0);

ISAM2 createISAM()
{
  ISAM2Params isam_params;
  isam_params.factorization = gtsam::ISAM2Params::CHOLESKY;
  isam_params.relinearizeSkip = 1;
  isam_params.findUnusedFactorSlots = false;
  isam_params.evaluateNonlinearError = false;
  return ISAM2(isam_params);
}

Issue 1: Missing nullptr check

To reproduce this error, just create a factor graph with a single variable and then try to marginalize that variable. Doing so results in a segfault:

TEST(ISAM2, MarginalizeEntireGraph)
{
  NonlinearFactorGraph factors;
  Values values;
  ISAM2 isam = createISAM();

  // Create a factor graph with one variable and a prior
  Pose3 root = Pose3::identity();
  Key root_key(0);
  values.insert(root_key, root);
  factors.addPrior(root_key, Pose3::identity(), NOISE);

  // Optimize the graph
  auto result = updateAndMarginalize(factors, values, {}, isam);
  auto estimate = isam.calculateBestEstimate();
  EXPECT_TRUE(estimate.size() == 1);

  // And remove the node from the graph
  KeySet marginalizable_keys({root_key});

  /********* This results in a segfault ********/
  updateAndMarginalize({}, {}, marginalizable_keys, isam);

  estimate = isam.calculateBestEstimate();
  EXPECT_TRUE(estimate.empty());
}

I've tracked this down to line 556 in ISAM2.cpp, the parent of the clique is dereferenced without checking if it is null:

// We do not need the marginal factors associated with this clique
// because their information is already incorporated in the new
// marginal factor.  So, now associate this marginal factor with the
// parent of this clique.
marginalFactors[clique->parent()->conditional()->front()].push_back(marginalFactor);

In this case the parent of the clique will be null since the node is a root. This is easy to fix, since we are getting rid of the entire tree it's not necessary to keep track of any marginal factors and just adding a null pointer check before pushing the marginal factor works:

if (clique->parent())
    marginalFactors[clique->parent()->conditional()->front()].push_back(marginalFactor);

Issue 2: Incorrect block matrix update

This issue is harder to trigger and only manifests for larger graphs that have undergone repeated marginalizations. I've tried to make the smallest possible example here. First, create a factor graph with pose variables layed out in a 2D grid. Each variable has a prior factor placed on it and then between factors connect each node to its neighbors:

Key idxToKey(const int i, const int j)
{
  return Key(i * GRID_DIM + j);
}

std::pair<NonlinearFactorGraph, Values> createGridGraph()
{
  NonlinearFactorGraph factors;
  Values values;

  // Create a grid of pose variables
  for (int i = 0; i < GRID_DIM; ++i) {
    for (int j = 0; j < GRID_DIM; ++j) {
      Pose3 pose = Pose3(Rot3::identity(), Point3(i, j, 0));
      Key key = idxToKey(i, j);
      // Place a prior
      factors.addPrior(key, Pose3(Rot3::identity(), Point3(i, j, 0)), NOISE);
      values.insert(key, pose);
      if (i > 0) {
        factors.emplace_shared<BetweenFactor<Pose3>>(idxToKey(i - 1, j), key, Pose3(Rot3::identity(), Point3(1, 0, 0)),
                                                     NOISE);
      }
      if (j > 0) {
        factors.emplace_shared<BetweenFactor<Pose3>>(idxToKey(i, j - 1), key, Pose3(Rot3::identity(), Point3(0, 1, 0)),
                                                     NOISE);
      }
    }
  }
  return { factors, values };
}

Then, one by one, we randomly select a variable, marginalize it, and calculate the best estimate:

TEST(ISAM2, GridMarginalization)
{
  ISAM2 isam = createISAM();
  auto [factors, values] = createGridGraph();

  // Optimize the graph
  auto result = updateAndMarginalize(factors, values, {}, isam);
  auto estimate = isam.calculateBestEstimate();

  // Get the list of keys
  std::vector<Key> key_list(GRID_DIM * GRID_DIM);
  std::iota(key_list.begin(), key_list.end(), 0);

  // Shuffle the keys -> we will marginalize the keys one by one in this order
  std::shuffle(key_list.begin(), key_list.end(), std::default_random_engine(1234));

  for (const auto& key : key_list) {
    gtsam::KeySet marginal_keys;
    marginal_keys.insert(key);
    result = updateAndMarginalize({}, {}, marginal_keys, isam);
    estimate = isam.calculateBestEstimate();
  }
}

Make sure that the seed is set to 1234, otherwise results will differ. With that seed this eventually throws an IndeterminantLinearSystem error, which it shouldn't be doing. We have a prior on each variable which should constrain the solution. Looking at things a little closer, it throws the IndeterminantLinearSystem error while working near variable 5, right after it marginalized key 25 from the clique: (25,5,3 : 2,7,11,17,22,26,33,35,43,45,54)

If we look at what happens while we are marginalizing key 25, it looks like there are issues with how the conditional's matrix is being updated ( cg->matrixObject() ). Lines 634-636 are where this update happens in ISAM2.cpp. Immediately before that update, the block matrix for the troublesome clique has the following state:

blockStart_: 1
rowStart_: 6
full():
    2.36847  0.00391703           0           0           0    0.368527  -0.0923723 -0.00937922           0     ...
          0     2.36788           0           0           0   -0.373996  0.00180666  -0.0663824           0     ...
          0           0      2.5815   -0.345869    0.348944           0           0           0  -0.0538384     ...
          0           0           0     2.07046    0.038592           0           0           0  -0.0147159     ...
          0           0           0           0     2.07189           0           0           0   -0.164523     ...
          0           0           0           0           0      1.9974   0.0228441    0.183647           0     ... 
          0           0           0           0           0           0     2.03959  -0.0246978           0     ...
          0           0           0           0           0           0           0     2.03342           0     ...
          0           0           0           0           0           0           0           0     2.24912     ...
          0           0           0           0           0           0           0           0           0     ...
          0           0           0           0           0           0           0           0           0     ...
          0           0           0           0           0           0           0           0           0     ...

We want to chop off the first 6 rows and first 6 columns of this matrix (since the variable we are marginalizing is a Pose3 type). However, what happens instead is that we chop off the first 6 rows while the columns remain the same. This is the state of the block matrix after line 636:

blockStart_: 1
rowStart_: 12
full():
         0          0          0          0          0          0    2.03959 -0.0246978          0      ...
         0          0          0          0          0          0          0    2.03342          0      ...
         0          0          0          0          0          0          0          0    2.24912      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...
         0          0          0          0          0          0          0          0          0      ...

which is a singular matrix and is the cause of the IndeterminantLinearSystem error. The issue is that while rowStart_ gets incremented, firstBlock_ only gets set via:

cg->matrixObject().firstBlock() = nToRemove;

on line 635. This isn't correct (although it works if firstBlock() is zero). Instead, firstBlock() should get incremented by nToRemove. We can do this be changing 635 to:

cg->matrixObject().firstBlock() += nToRemove;

which results in the following block matrix after the update:

blockStart_: 2
rowStart_: 12
     2.03959   -0.0246978            0            0            0     0.305831   -0.0839919 -0.000843686          ...
           0      2.03342            0            0            0    -0.411396  -0.00587566    -0.129779          ...
           0            0      2.24912    -0.282895     0.371505            0            0            0          ...
           0            0            0      1.72409    0.0556498            0            0            0          ...
           0            0            0            0      1.73737            0            0            0          ...
           0            0            0            0            0      1.65379    -0.249457   -0.0440415          ...
           0            0            0            0            0            0      2.07927   -0.0337653          ...
           0            0            0            0            0            0            0      2.08893          ...
           0            0            0            0            0            0            0            0          ...
           0            0            0            0            0            0            0            0          ...
           0            0            0            0            0            0            0            0          ...
           0            0            0            0            0            0            0            0          ...

which appears to be correct. After this change the IndeterminantLinearSystem errors are gone and this part of the test passes.

Issue 3: Incorrect ordering assumption

The next issue comes when trying to marginalize the variable with key 3. This variable is in the clique:

(5,3,2,1 : 7,10,17,21,22,26,33,35,43,45,54).

After this marginalization call calculateBestEstimate gives the following error:

Invalid Key: 3
"Requested to update a VectorValues with another VectorValues that contains keys not present in the first."

The problem is that the variable with key 3 was not actually removed during the call to marginalizeLeaves. This check on line 631:

// Split the current clique
// Find the position of the last leaf key in this clique
DenseIndex nToRemove = 0;
while (leafKeys.exists(cg->keys()[nToRemove])) ++nToRemove;

sets nToRemove as zero since cg->keys()[0] = 5 is not a key that we want to remove. It's simple enough to change this so that nToRemove is calculated correctly, but even if we have nToRemove = 1 the result will be that variable 5 will be removed instead of variable 3 (or even worse, some parts of variable 5 could be removed or variable 5 and some parts of variable 3).

One possible way to fix this is to re-order keys in the clique as we are marginalizing them. I think that replacing line 631 with something like this could work:

DenseIndex nToRemove = 0;
for (int i = 0; i < cg->frontals().size(); ++i) {
    if (leafKeys.exists(cg->frontals()[i])) {
        if (i != nToRemove) {
            // Swap the key and matrix block at index i with the key 
            // and matrix block at index nToRemove
        }
        ++nToRemove;
    }
}

I've gotten this work for the case where all variables have the same size, however the general case is a little trickier. The problem is that when you swap matrix blocks you have to shuffle the blocks up or down in order to preserve the upper triangular structure of the matrix. This involves copying data and could wind up being expensive.

I'm hoping that there might also be an easy way to re-order the keys/matrix in the conditional that I'm missing. When I made the modification to swap matrix blocks, the test was able to finish and the results all appeared correct. I'm happy to put up a PR with these changes, but would like some input on the best way to solve this last problem first.

Additional Details

This is the state of the Bayes Tree immediately before the failure described in issue 2: issue_2

This is the state of the Bayes Tree immediately before the failure described in issue 3: issue_3

Here is the full code for reproducing these results:

#include <gtest/gtest.h>
#include <gtsam/base/FastMap.h>
#include <gtsam/geometry/Pose3.h>
#include <gtsam/inference/Key.h>
#include <gtsam/nonlinear/ISAM2.h>
#include <gtsam/slam/BetweenFactor.h>

#include <boost/optional.hpp>
#include <stack>

using namespace gtsam;

static const boost::shared_ptr<noiseModel::Isotropic> NOISE = noiseModel::Isotropic::Sigma(6, 1.0);

ISAM2 createISAM()
{
  ISAM2Params isam_params;
  isam_params.factorization = ISAM2Params::CHOLESKY;
  isam_params.relinearizeSkip = 1;
  isam_params.findUnusedFactorSlots = false;
  isam_params.evaluateNonlinearError = false;
  return ISAM2(isam_params);
}

boost::optional<FastMap<Key, int>> createOrderingConstraints(const ISAM2& isam, const KeyVector& new_keys,
                                                             const KeySet& marginalizable_keys)
{
  if (marginalizable_keys.empty()) {
    return boost::none;
  } else {
    FastMap<Key, int> constrained_keys = FastMap<Key, int>();
    // Generate ordering constraints so that the marginalizable variables will be eliminated first
    // Set all existing and new variables to Group1
    for (const auto& [key, val] : isam.getDelta()) {
      constrained_keys.emplace(key, 1);
    }
    for (const auto& key : new_keys) {
      constrained_keys.emplace(key, 1);
    }
    // And then re-assign the marginalizable variables to Group0 so that they'll all be leaf nodes
    for (const auto& key : marginalizable_keys) {
      constrained_keys.at(key) = 0;
    }
    return constrained_keys;
  }
}

void markAffectedKeys(const Key& key, const ISAM2Clique::shared_ptr& root_clique, KeyList& additional_keys)
{
  std::stack<ISAM2Clique::shared_ptr> frontier;
  frontier.push(root_clique);
  // Basic DFS to find additional keys
  while (!frontier.empty()) {
    // Get the top of the stack
    const ISAM2Clique::shared_ptr clique = frontier.top();
    frontier.pop();
    // Check if we have more keys and children to add
    if (std::find(clique->conditional()->beginParents(), clique->conditional()->endParents(), key) !=
        clique->conditional()->endParents()) {
      for (Key i : clique->conditional()->frontals()) {
        additional_keys.push_back(i);
      }
      for (const ISAM2Clique::shared_ptr& child : clique->children) {
        frontier.push(child);
      }
    }
  }
}

ISAM2Result updateAndMarginalize(const NonlinearFactorGraph& new_factors, const Values& new_values,
                                 const KeySet& marginalizable_keys, ISAM2& isam)
{
  // Force ISAM2 to put marginalizable variables at the beginning
  const boost::optional<FastMap<Key, int>> ordering_constraints =
      createOrderingConstraints(isam, new_values.keys(), marginalizable_keys);

  // Mark additional keys between the marginalized keys and the leaves
  KeyList additional_marked_keys;
  for (Key key : marginalizable_keys) {
    ISAM2Clique::shared_ptr clique = isam[key];
    for (const ISAM2Clique::shared_ptr& child : clique->children) {
      markAffectedKeys(key, child, additional_marked_keys);
    }
  }

  // Update
  ISAM2Result result =
      isam.update(new_factors, new_values, FactorIndices{}, ordering_constraints, boost::none, additional_marked_keys);

  // And marginalize
  if (!marginalizable_keys.empty()) {
    FastList<Key> leaf_keys(marginalizable_keys.begin(), marginalizable_keys.end());
    isam.marginalizeLeaves(leaf_keys);
  }

  return result;
}

TEST(ISAM2, MarginalizeEntireGraph)
{
  NonlinearFactorGraph factors;
  Values values;
  ISAM2 isam = createISAM();

  // Create a factor graph with one variable and a prior
  Pose3 root = Pose3::identity();
  Key root_key(0);
  values.insert(root_key, root);
  factors.addPrior(root_key, Pose3::identity(), NOISE);

  // Optimize the graph
  auto result = updateAndMarginalize(factors, values, {}, isam);
  auto estimate = isam.calculateBestEstimate();
  EXPECT_TRUE(estimate.size() == 1);

  // And remove the node from the graph
  KeySet marginalizable_keys;
  marginalizable_keys.insert(root_key);

  /********* This results in a segfault ********/
  updateAndMarginalize({}, {}, marginalizable_keys, isam);

  estimate = isam.calculateBestEstimate();
  EXPECT_TRUE(estimate.empty());
}

static const int GRID_DIM = 10;

Key idxToKey(const int i, const int j)
{
  return Key(i * GRID_DIM + j);
}

std::pair<NonlinearFactorGraph, Values> createGridGraph()
{
  NonlinearFactorGraph factors;
  Values values;

  // Create a grid of pose variables
  for (int i = 0; i < GRID_DIM; ++i) {
    for (int j = 0; j < GRID_DIM; ++j) {
      Pose3 pose = Pose3(Rot3::identity(), Point3(i, j, 0));
      Key key = idxToKey(i, j);
      // Place a prior on the first pose
      factors.addPrior(key, Pose3(Rot3::identity(), Point3(i, j, 0)), NOISE);
      values.insert(key, pose);
      if (i > 0) {
        factors.emplace_shared<BetweenFactor<Pose3>>(idxToKey(i - 1, j), key, Pose3(Rot3::identity(), Point3(1, 0, 0)),
                                                     NOISE);
      }
      if (j > 0) {
        factors.emplace_shared<BetweenFactor<Pose3>>(idxToKey(i, j - 1), key, Pose3(Rot3::identity(), Point3(0, 1, 0)),
                                                     NOISE);
      }
    }
  }
  return { factors, values };
}

TEST(ISAM2, GridMarginalization)
{
  ISAM2 isam = createISAM();
  auto [factors, values] = createGridGraph();

  // Optimize the graph
  auto result = updateAndMarginalize(factors, values, {}, isam);
  auto estimate = isam.calculateBestEstimate();

  // Get the list of keys
  std::vector<Key> key_list(GRID_DIM * GRID_DIM);
  std::iota(key_list.begin(), key_list.end(), 0);

  // Shuffle the keys -> we will marginalize the keys one by one in this order
  std::shuffle(key_list.begin(), key_list.end(), std::default_random_engine(1234));

  for (const auto& key : key_list) {
    gtsam::KeySet marginal_keys;
    marginal_keys.insert(key);
    result = updateAndMarginalize({}, {}, marginal_keys, isam);
    estimate = isam.calculateBestEstimate();
  }
}

int main(int argc, char** argv)
{
  testing::InitGoogleTest(&argc, argv);
  return RUN_ALL_TESTS();
}
dellaert commented 2 years ago

@gradyrw , as it happens we were looking at marginalization today, but it's been a while since I did a deep-dive - and the students that helped create this are all gone. I am prepared to take a deeper look if you could create a some unit tests that fail. I just pushed a branch fix/iSAM2 which you could create a PR against from a fork, if you're willing.

gradyrw commented 2 years ago

@gradyrw , as it happens we were looking at marginalization today, but it's been a while since I did a deep-dive - and the students that helped create this are all gone. I am prepared to take a deeper look if you could create a some unit tests that fail. I just pushed a branch fix/iSAM2 which you could create a PR against from a fork, if you're willing.

Sounds good. I'll put up a PR with the failing tests soon.

gradyrw commented 2 years ago

I've had a chance to look into this again and have an update. It looks like issue 3 that I brought up isn't actually a problem, there's a bug in the markAffectedKeys function that I was using to re-organize the tree which was causing that problem to appear. This is good since the first two issues are really easy to fix. I did find one additional issue, marginalizeLeaves ignores the findUnusedFactorSlots parameter, however this is also relatively easy to fix.

I'll try to make a PR with all of these fixes together in the next few days.

dellaert commented 2 years ago

great!

patripfr commented 2 years ago

@gradyrw I've also ran into the situation you mentioned in issue 3. I end up with a clique P( v90 b90 x90 x88 v88 b88 | b206 v206 x206 ) where b88 v88 x88 are supposed to be marginalized out, but due to the ordering assumption they are not removed. You mentioned that you made your proposed solution work for the case where all variables have the same size, would you be willing to share the respective lines?

gradyrw commented 2 years ago

@gradyrw I've also ran into the situation you mentioned in issue 3. I end up with a clique P( v90 b90 x90 x88 v88 b88 | b206 v206 x206 ) where b88 v88 x88 are supposed to be marginalized out, but due to the ordering assumption they are not removed. You mentioned that you made your proposed solution work for the case where all variables have the same size, would you be willing to share the respective lines?

The problem turned out to be my usage of "markAffectedKeys". That function wasn't marking all of the necessary variables in order to enforce the ordering requirement. It marks all of the keys in subtrees, but misses keys belonging to the same clique as the marginal variables. In your case, the keys {v90, b90, x90} need to get marked, but won't be. Modifying "markAffectedKeys" so that it handles this case fixes the issue. I think that I actually didn't quite do this correctly in the test code which does this in #1172

patripfr commented 2 years ago

@gradyrw I just tested the most recent fix/iSAM2 with your changes merged and the issue does not appear anymore. Thanks!

dellaert commented 2 years ago

Yay!

ProfFan commented 1 year ago

@dellaert Appears that we have not merged in the fix/iSAM2 branch, I'll open a PR

changxi188 commented 1 year ago

We have encountered the same Issue 3, but it has not been fixed in Release 4.2. When can we expect this issue to be merged into Release 4.2

ProfFan commented 1 year ago

@dellaert We need to cherry-pick this to the 4.2 tree

truher commented 3 weeks ago

should this fix be available in the 4.2 release now? i'm seeing issues that might be related, see https://groups.google.com/g/gtsam-users/c/qAuXDKyR1ac/m/2k_M7iTjAAAJ

ProfFan commented 3 weeks ago

It's probably not, let me see what I can do

truher commented 3 weeks ago

ok. since i can work around the particular issues i'm seeing, it's not urgent.

ProfFan commented 3 weeks ago

1796