Open ScottMcMichael opened 1 year ago
Thank you for the report! Is this reproducible with a simple graph that is just super long (like a very long chain of odometry)?
It can also be reproduced by a long chain of BetweenFactors. I updated the test file to set that up without using the data file.
I have a fix for this but it is much more complicated than the existing code: https://github.com/ScottMcMichael/gtsam/tree/fix_large_graph_serialization_stack_overflow
I have to replace the serialize code in BayesTree.h plus add some support code in the Clique classes. The new serialization code has to replicate some functionality from Boost but it is able to parse the trees non-recursively, eliminating the stack overflow when dealing with large trees. I can clean it up and create a PR but I would like someone to check my approach to make sure I am not overlooking something since I am not entirely sure how the cliques work.
Description
Serialization of an ISAM2 solver object reliably fails after the graph reaches a certain size.
Steps to reproduce
Download data file here: https://drive.google.com/file/d/1bVyEQc3ogHmfuGTL9fHVptIqakV_15Vs/view?usp=sharing
Compile and run the new gtest added here:
https://github.com/ScottMcMichael/gtsam/blob/isam_serialization_bug/gtsam/nonlinear/tests/testSerializationNonlinear.cpp
Either put the data file in the execution directory or manually set the path to the data file in the serialization test.
(executable is testSerializationNonlinear)
Expected behavior
The test will produce the following output:
Write: .//gt0.bin Segmentation fault (core dumped)
Environment
Ubuntu 20.04.5, default CMake build configuration
Additional information
The crash appears to be caused by a stack overflow, either from an infinite recursion or a very expansive recursion that needs to be replaced with another kind of loop. According to the stack trace the two lines of GTSAM code that the program keeps bouncing between are:
542768 gtsam::ISAM2Clique::serialize (ar=..., this=0x7ffa7c001dd0) at .../install/gtsam/include/gtsam/nonlinear/ISAM2Clique.h:154
and
542739 gtsam::BayesTreeCliqueBase<gtsam::ISAM2Clique, gtsam::GaussianFactorGraph>::serialize (ar=..., this=0x7ffa7c001
dd0) at .../install/gtsam/include/gtsam/inference/BayesTreeCliqueBase.h:212