Open gradyrw opened 2 years ago
@gradyrw , as it happens we were looking at marginalization today, but it's been a while since I did a deep-dive - and the students that helped create this are all gone. I am prepared to take a deeper look if you could create a some unit tests that fail. I just pushed a branch fix/iSAM2
which you could create a PR against from a fork, if you're willing.
@gradyrw , as it happens we were looking at marginalization today, but it's been a while since I did a deep-dive - and the students that helped create this are all gone. I am prepared to take a deeper look if you could create a some unit tests that fail. I just pushed a branch
fix/iSAM2
which you could create a PR against from a fork, if you're willing.
Sounds good. I'll put up a PR with the failing tests soon.
I've had a chance to look into this again and have an update. It looks like issue 3 that I brought up isn't actually a problem, there's a bug in the markAffectedKeys
function that I was using to re-organize the tree which was causing that problem to appear. This is good since the first two issues are really easy to fix. I did find one additional issue, marginalizeLeaves
ignores the findUnusedFactorSlots
parameter, however this is also relatively easy to fix.
I'll try to make a PR with all of these fixes together in the next few days.
great!
@gradyrw I've also ran into the situation you mentioned in issue 3. I end up with a clique P( v90 b90 x90 x88 v88 b88 | b206 v206 x206 )
where b88 v88 x88
are supposed to be marginalized out, but due to the ordering assumption they are not removed. You mentioned that you made your proposed solution work for the case where all variables have the same size, would you be willing to share the respective lines?
@gradyrw I've also ran into the situation you mentioned in issue 3. I end up with a clique
P( v90 b90 x90 x88 v88 b88 | b206 v206 x206 )
whereb88 v88 x88
are supposed to be marginalized out, but due to the ordering assumption they are not removed. You mentioned that you made your proposed solution work for the case where all variables have the same size, would you be willing to share the respective lines?
The problem turned out to be my usage of "markAffectedKeys". That function wasn't marking all of the necessary variables in order to enforce the ordering requirement. It marks all of the keys in subtrees, but misses keys belonging to the same clique as the marginal variables. In your case, the keys {v90, b90, x90}
need to get marked, but won't be. Modifying "markAffectedKeys" so that it handles this case fixes the issue. I think that I actually didn't quite do this correctly in the test code which does this in #1172
@gradyrw I just tested the most recent fix/iSAM2
with your changes merged and the issue does not appear anymore. Thanks!
Yay!
@dellaert Appears that we have not merged in the fix/iSAM2
branch, I'll open a PR
We have encountered the same Issue 3, but it has not been fixed in Release 4.2. When can we expect this issue to be merged into Release 4.2
@dellaert We need to cherry-pick this to the 4.2 tree
should this fix be available in the 4.2 release now? i'm seeing issues that might be related, see https://groups.google.com/g/gtsam-users/c/qAuXDKyR1ac/m/2k_M7iTjAAAJ
It's probably not, let me see what I can do
ok. since i can work around the particular issues i'm seeing, it's not urgent.
Description
I have been trying to use ISAM2's marginalizeLeaves function and encountered a few issues, particularly on larger problems. I've tracked the issues down as best I can and I believe there are 3 different bugs in the marginalizeLeaves function which are causing problems:
There is a missing
nullptr
check which is required in cases where a leaf is also a root (e.g. marginalizing the only node in a graph).The vertical block matrix of the clique's conditional is not getting updated properly in cases where the entire clique is not marginalized
The logic for splitting a clique assumes that keys are ordered in a way such that all the marginalized keys appear before any keys that are being kept. This is not always true.
I think there are simple solutions for the first two issues, but I'm not sure about the final one.
Steps to reproduce
I'm using this code to force the keys that I want to marginalize to be leaves and to then update ISAM2 and marginalize:
I've adapated this code from IncrementalFixedLagSmoother.cpp and it appears to work fine (I haven't encountered any errors from trying to marginalize non-leaf nodes). Next I've got the following helpers for creating an ISAM2 object and defining noise models for factors:
Issue 1: Missing nullptr check
To reproduce this error, just create a factor graph with a single variable and then try to marginalize that variable. Doing so results in a segfault:
I've tracked this down to line 556 in ISAM2.cpp, the parent of the clique is dereferenced without checking if it is null:
In this case the parent of the clique will be null since the node is a root. This is easy to fix, since we are getting rid of the entire tree it's not necessary to keep track of any marginal factors and just adding a null pointer check before pushing the marginal factor works:
Issue 2: Incorrect block matrix update
This issue is harder to trigger and only manifests for larger graphs that have undergone repeated marginalizations. I've tried to make the smallest possible example here. First, create a factor graph with pose variables layed out in a 2D grid. Each variable has a prior factor placed on it and then between factors connect each node to its neighbors:
Then, one by one, we randomly select a variable, marginalize it, and calculate the best estimate:
Make sure that the seed is set to 1234, otherwise results will differ. With that seed this eventually throws an
IndeterminantLinearSystem
error, which it shouldn't be doing. We have a prior on each variable which should constrain the solution. Looking at things a little closer, it throws theIndeterminantLinearSystem
error while working near variable 5, right after it marginalized key 25 from the clique:(25,5,3 : 2,7,11,17,22,26,33,35,43,45,54)
If we look at what happens while we are marginalizing key 25, it looks like there are issues with how the conditional's matrix is being updated (
cg->matrixObject()
). Lines 634-636 are where this update happens inISAM2.cpp
. Immediately before that update, the block matrix for the troublesome clique has the following state:We want to chop off the first 6 rows and first 6 columns of this matrix (since the variable we are marginalizing is a Pose3 type). However, what happens instead is that we chop off the first 6 rows while the columns remain the same. This is the state of the block matrix after line 636:
which is a singular matrix and is the cause of the
IndeterminantLinearSystem
error. The issue is that whilerowStart_
gets incremented,firstBlock_
only gets set via:on line 635. This isn't correct (although it works if firstBlock() is zero). Instead, firstBlock() should get incremented by
nToRemove
. We can do this be changing 635 to:which results in the following block matrix after the update:
which appears to be correct. After this change the
IndeterminantLinearSystem
errors are gone and this part of the test passes.Issue 3: Incorrect ordering assumption
The next issue comes when trying to marginalize the variable with key 3. This variable is in the clique:
(5,3,2,1 : 7,10,17,21,22,26,33,35,43,45,54).
After this marginalization call
calculateBestEstimate
gives the following error:The problem is that the variable with key 3 was not actually removed during the call to marginalizeLeaves. This check on line 631:
sets
nToRemove
as zero sincecg->keys()[0] = 5
is not a key that we want to remove. It's simple enough to change this so thatnToRemove
is calculated correctly, but even if we havenToRemove = 1
the result will be that variable 5 will be removed instead of variable 3 (or even worse, some parts of variable 5 could be removed or variable 5 and some parts of variable 3).One possible way to fix this is to re-order keys in the clique as we are marginalizing them. I think that replacing line 631 with something like this could work:
I've gotten this work for the case where all variables have the same size, however the general case is a little trickier. The problem is that when you swap matrix blocks you have to shuffle the blocks up or down in order to preserve the upper triangular structure of the matrix. This involves copying data and could wind up being expensive.
I'm hoping that there might also be an easy way to re-order the keys/matrix in the conditional that I'm missing. When I made the modification to swap matrix blocks, the test was able to finish and the results all appeared correct. I'm happy to put up a PR with these changes, but would like some input on the best way to solve this last problem first.
Additional Details
This is the state of the Bayes Tree immediately before the failure described in issue 2:
This is the state of the Bayes Tree immediately before the failure described in issue 3:
Here is the full code for reproducing these results: