Closed chait-desai closed 4 years ago
Excellent!
Take a look at InitializePose3.h
, see whether that helps you out. PS this is more a message for the Google group, https://groups.google.com/forum/#!forum/gtsam-users
This is a well known problem of nonlinear optimization, not a GTSAM issue, so I'll close this now.
Thanks! Do you mind if I post this on the google group to see if anyone has figured the "right recipe" for those 3 datasets?
Of course.
@dellaert Since ceres-solver could optimize these g2o tasks without any problem I think that's could be something wrong with GTSAM (at least slam examples).
OK, I will re-open. That is worrisome. Are you using the current develop, and a simple Gaussian noise model?
I used standart datasets and samples. Datasets: https://lucacarlone.mit.edu/datasets/
ceres-solver sample: https://github.com/ceres-solver/ceres-solver/tree/master/examples/slam/pose_graph_3d
gtsam sample: https://github.com/borglab/gtsam/blob/develop/examples/Pose3SLAMExample_g2o.cpp
There is a code snippet:
gtsam::LevenbergMarquardtParams LmParameters;
gtsam::LevenbergMarquardtParams::SetCeresDefaults(&LmParameters);
//LmParameters.linearSolverType = gtsam::NonlinearOptimizerParams::MULTIFRONTAL_CHOLESKY;
LmParameters.linearSolverType = gtsam::NonlinearOptimizerParams::MULTIFRONTAL_QR;
LmParameters.setVerbosityLM("ERROR");
gtsam::PCGSolverParameters::shared_ptr Pcg =
boost::make_shared<gtsam::PCGSolverParameters>();
Pcg->preconditioner_ =
boost::make_shared<gtsam::BlockJacobiPreconditionerParameters>();
// Following is crucial:
Pcg->setEpsilon_abs(1e-10);
Pcg->setEpsilon_rel(1e-10);
LmParameters.iterativeParams = Pcg;
that I lifted from a test somewhere in GTSAM that has worked great on some huge problems for me in the past where I had not previously had any luck. The quite small epsilons are not values I would have picked. (The choice of QR over Cholesky was mine, in response to some numerical trouble). If Lev-Mar was in the mix, the SetCeresDefaults
option with those epsilons might be worth evaluating.
Looks like I copied that from one of the BAL timing tests and this SFM PCG example.
@yetongumich will post some experiments we did shortly. In the meantime: were you able to converge from the pose initialization in the g2o file, or did you use a different initialization method?
@dellaert I used provided samples and datasets without any modifications. I did not change initialization.
@mikesheffler With SetCeresDefault()
gtsam work better. I mean error did not increased. But final error was only about 10 times less that start one. Ceres in same time reduced error by 1000 times.
Frank and I have made a Colab script that run the datasets. You can view and run it through this link https://colab.research.google.com/drive/1TvrrEdyLmLFSJmeLiBWFdDiSUvjwETuR
Thanks. Original gtsam sample did not use intial values from g2o. I did not expect it) https://github.com/borglab/gtsam/blob/b6c5bceae95d4512a1e839288b22ac73c3546131/examples/Pose3SLAMExample_g2o.cpp#L50 I will recalculate datasets tomorrow.
Some screenshots below. But, these definitely do not converge with the values in the g2o files, so if Ceres does converge with those values something bad could be happening. Note below the first error value corresponds to the g2o values.
ceres results:
grid3d Cost: Initial 7.217511e+07 Final 4.215951e+04 Minimizer iterations 30
torus3d Cost: Initial 1.886124e+06 Final 2.255619e+04 Minimizer iterations 78
parking-garage Cost: Initial 8.362723e+03 Final 6.341883e-01 Minimizer iterations 20
sphere_bignoise_vertex3 Cost: Initial 1.134837e+08 Final 1.478348e+06 Minimizer iterations 41
It's not clear for me why initial error is different?
Yes, and why it is the same for parking-garage... :-/ @yetongumich any ideas??
So you mean that it's impossible to optimize original g2o files without additional initialization? Ceres done it for some datasets:
grid3D:
parking-garage:
torus:
rim:
sphere:
Chordal method looks realy cool will read about it.
I also will play with ceres solver parameters
Indeed. I have a suspicion, which is that the issue is with the retract implementation: we use Cayley retract in Rot3, and a decoupled (translation, Rot3::Cayley) retract in Pose3. This is done for efficiency, but it might reduce the basin of attraction. There are two compilation flags that govern this behavior, so if someone would be so inclined :-) it would be nice to see initial and final errors, convergence curves and total times for the two settings of the POSE3_EXPMAP/ROT3_EXMAP flags, with and without chordal. If they really don’t help efficiency much, even in the chordal case, we might want to change their default or even remove Cayley.
Ok. I will test.
@dellaert Frank, what you wrote made me think of a statement that is on the page for the RPGO tool in Kimera from Luca's lab. They use
cmake .. -DGTSAM_POSE3_EXPMAP=ON -DGTSAM_ROT3_EXPMAP=ON
and comment:
Note: The reason why we need EXPMAP is for the correct calculation of Jacobians. Enabling this and the #define SLOW_BUT_CORRECT_BETWEENFACTOR before any #include <gtsam/slam/BetweenFactor.h> are both important. Otherwise the default are some identity approximations for rotations, which works for some cases but fails for cases with manual loop closures, or artifacts. Note that sudo make check in gtsam will partially fail because some unittests assume the EXPMAP flags to be off.
Is this the potential issue you're thinking of? Is there anyone from MIT on here that might have recently gone through this exercise?
#define SLOW_BUT_CORRECT_BETWEENFACTOR
before any#include <gtsam/slam/BetweenFactor.h>
are both important
This seems to be a bug. design choice
#define SLOW_BUT_CORRECT_BETWEENFACTOR
before any#include <gtsam/slam/BetweenFactor.h>
are both importantThis seems to be a bug.
It's not. It's deliberate, but we might have to revise this, based on the data @victor1234 will provide :-)
Is this the potential issue you're thinking of? Is there anyone from MIT on here that might have recently gone through this exercise?
It is! There is even an open PR about this. This issue could shed a lot of light on how to resolve this :-)
It's not. It's deliberate, but we might have to revise this
@dellaert Oh I understand now...
I got first result with EXPMAP flags. Working on visual representation. Is it possiible to get timestamp for every solver iteration?
No, we should really have hooks :-/ but you can call iterate yourself. I.e., roll your own version of NonlinearOptimzer::defaultOptimize, e.g, by subclassing LM.
No, we should really have hooks :-/
I also found the need for this... would you accept proposals on this on a separate PR?
Of course :-)
I recompiled gtsam with expmap and quanterions
params.setVerbosity("ERROR")
LevenbergMarquardtParams::SetCeresDefaults(¶ms)
original code, no changes
solver | original | chordal |
---|---|---|
ceres | 547 | 114 |
gtsam | 229.56 | 39 |
gtsam-expmap | 1377 | 158 |
gtsam-grid3D-expmap-chordal-39.log gtsam-grid3D-expmap-1377.log ceres-grid3D-chordal-158.32.log ceres-grid3D-547.51.log gtsam-grid3D-229.56.log gtsam-grid3d-chordal-114.log
Whoa!
Is this all with the same initial estimate for the first graph?
First figure without any initialization, second one with chordal
Great to see these graphs. I think chordal graphs are less of interest, given these. It seems the issue is really convergence far from the minimum, for which our Cayley map is really bad, apparently. And expmap also gets stuck somewhere.
Will you post the graphs for the other datasets as well?
PS, one way to produce and share graphs is using the really cool site comet.ml, but that only works if your top-level loop is python :-)
Yes. I could post if its need. Do you know why initial error matches with ceres only for expmap? And even for good initial values (chordal) ceres provide better result.
Ok. I will check comet.ml. Thanks
Absolutely, please post. I think you should call error and plot that as iteration “0” before calling iterate the first time. That should match, unless the prior is off. Comet.ml will take some time, so please post from your current pipeline if you can. Comet.ml would be cool to have As a script executed automatically as part of a build.
ok 0 iteration is initial error on plots
The error metric for Ceres is different, so “better” is not really true per se. Ceres uses a metric derived from the quarternion, whereas we use geodesic error. Cayley is an approximation of that which is only good close to the origin. An interesting experiment would be to use exactly the same factor as in the Ceres code. PS, when you post new graphs, could you make the x-axis time? I suspect Ceres will be faster, too, would be good to know how much faster :-)
To get iterations timestamps I need to create own solver like you described. WIll do.
Is it make sense to use ISAM2 for this task? Will this solver split task on subgraphs?
If you’re creating your own solver, then definitely do it in python with comet.ml, and it will do the timestamps automatically.
PS the work you are doing here is great. If you’re interested we can expand this into a blog post (and even a small paper after that), and in fact provide some options to GTSAM users for pose optimization. There are several different ways to optimize that we have not touched upon yet (Frobenius norm, e.g.) and there really has not been a systematic exploration of the convergence behavior. I suspect that the bottom-line is it won’t matter too much if you use a good initialization strategy, like chordal.
I need good pgo solver so I ready to any work for that. But I'm new in gtsam and have some questions about how it works. Is it appropriate to ask you? Is the issue tracker format suitable for this?
Yes, send me an email at dellaert@gmail.com and we'll set up a meeting
@ProfFan and I did some more work on this, see https://colab.research.google.com/drive/1E9Q-BZAIjL_YPjJbkEPXAresQFwuqcsQ
In summary, Cayley overestimates the true angle $\theta$ of the axis-angle representation:
The objective function we use in GTSAM is half the sum of squares of the error vector, which for the true exponential map corresponds to $\theta^2/2$. But, of course, if we use Cayley then the error will always be higher than that, and in fact will approach $\infty$ as $\theta$ approaches $\pi$.
Ceres uses $2 sin^2(\theta)$ as the objective function, which is equal to $(1-cos(\theta))$.
We compare all of them below.
The above is the story for the error, but the compilation flags also interact in perhaps non-trivial ways to decide what Pose3::retract
means:
POSE3EXMAP | QUATERNION | ROT3_EXPMAP | ||
---|---|---|---|---|
T | N/A | N/A | True SE(3) exponential map | |
F | T | N/A | Decoupled SE(3) retract | Quaternion exponential map |
F | F | T | Decoupled SE(3) retract | SO(3) exponential map |
F | F | F | Decoupled SE(3) retract | Cayley approximation for SO(3) |
Note Cayley is not available for the Quaternion compilation path, so the setting of ROT3_EXMAP has to be TRUE.
GTSAM uses the last row by default. Ceres uses the second row, I believe: quaternion exponential map for rotation, and just vector addition for translation - applied separately.
Has anyone studied (in a paper) what's the convergence basin of the different options? I've been wondering for years whether the decoupled SE(3) retract (which I feel must be correct as well as the exact log/exp maps), has any negative implications in terms of convergence :-)
Yes - this is what I’m proposing to examine in a bit more detail here. @victor1234 has not gotten back to me yet by email, though :-) @jlblancoc if you want to be involved let me know? I think we can also compare different Pose3 factor types: there is no reason that we have to use the localCoordinates variant as the error as well: these can be decoupled. Esp. Frobenius norm and (1-cos) error would be interesting to see.
This is a cost function description and code for ceres-solver: Description http://ceres-solver.org/nnls_tutorial.html#other-examples example №11 code https://github.com/ceres-solver/ceres-solver/blob/57cf20aa5d3c1b2f25d255814f4fff5260db81c6/examples/slam/pose_graph_3d/pose_graph_3d_error_term.h#L42
If I set params.setVerbosityLM() = "SUMMARY"
I will get output with timestamps
I understand that when error increase iteration counter does not change. But why sometimes table header apperas again? Shoudl I just ignore it. And count all iteration sequentially? And what mean success column?
Initial error: 9.52809e+07, values: 8000
iter cost cost_change lambda success iter_time
0 3.118173e+08 -2.17e+08 1.00e-05 1 6.91e+00
iter cost cost_change lambda success iter_time
0 3.286838e+08 -2.33e+08 1.00e-04 1 6.42e+00
iter cost cost_change lambda success iter_time
0 3.184375e+08 -2.23e+08 1.00e-03 1 6.37e+00
iter cost cost_change lambda success iter_time
0 1.488340e+08 -5.36e+07 1.00e-02 1 6.33e+00
iter cost cost_change lambda success iter_time
0 6.195424e+07 3.33e+07 1.00e-01 1 6.44e+00
1 6.789750e+07 -5.94e+06 1.00e-02 1 7.08e+00
1 1.876466e+07 4.32e+07 1.00e-01 1 6.30e+00
2 1.441776e+07 4.35e+06 1.00e-02 1 6.47e+00
3 7.075893e+06 7.34e+06 1.00e-03 1 6.37e+00
4 4.325054e+06 2.75e+06 1.00e-04 1 6.44e+00
5 1.777713e+06 2.55e+06 1.00e-05 1 6.35e+00
6 9.038552e+05 8.74e+05 1.00e-06 1 6.39e+00
7 7.000553e+05 2.04e+05 1.00e-07 1 6.56e+00
8 6.155552e+05 8.45e+04 1.00e-08 1 6.59e+00
9 5.916989e+05 2.39e+04 1.00e-09 1 6.48e+00
10 5.618014e+05 2.99e+04 1.00e-10 1 6.48e+00
11 5.467778e+05 1.50e+04 1.00e-11 1 6.34e+00
12 5.212531e+05 2.55e+04 1.00e-12 1 7.18e+00
13 5.132730e+05 7.98e+03 1.00e-13 1 6.42e+00
14 5.037389e+05 9.53e+03 1.00e-14 1 6.29e+00
15 4.992285e+05 4.51e+03 1.00e-15 1 6.27e+00
16 4.888992e+05 1.03e+04 1.00e-16 1 6.35e+00
17 4.777444e+05 1.12e+04 1.00e-17 1 6.23e+00
18 4.694427e+05 8.30e+03 1.00e-18 1 6.31e+00
19 4.668781e+05 2.56e+03 1.00e-19 1 7.02e+00
20 4.610915e+05 5.79e+03 1.00e-20 1 6.35e+00
21 4.535097e+05 7.58e+03 1.00e-21 1 6.34e+00
22 4.419965e+05 1.15e+04 1.00e-22 1 6.37e+00
23 4.342941e+05 7.70e+03 1.00e-23 1 6.32e+00
24 4.284835e+05 5.81e+03 1.00e-24 1 6.27e+00
25 4.202964e+05 8.19e+03 1.00e-25 1 6.63e+00
26 4.076103e+05 1.27e+04 1.00e-26 1 6.95e+00
27 3.911727e+05 1.64e+04 1.00e-27 1 6.29e+00
28 3.757958e+05 1.54e+04 1.00e-28 1 6.23e+00
29 3.614277e+05 1.44e+04 1.00e-29 1 6.25e+00
30 3.523937e+05 9.03e+03 1.00e-30 1 6.77e+00
31 3.418022e+05 1.06e+04 1.00e-31 1 6.35e+00
32 3.343921e+05 7.41e+03 1.00e-32 1 6.31e+00
33 3.286182e+05 5.77e+03 1.00e-33 1 6.32e+00
34 3.257668e+05 2.85e+03 1.00e-34 1 6.26e+00
35 3.212957e+05 4.47e+03 1.00e-35 1 6.64e+00
36 3.204919e+05 8.04e+02 1.00e-36 1 6.37e+00
37 3.193675e+05 1.12e+03 1.00e-37 1 6.33e+00
38 3.179684e+05 1.40e+03 1.00e-38 1 6.38e+00
39 3.170004e+05 9.68e+02 1.00e-39 1 6.43e+00
40 3.168226e+05 1.78e+02 1.00e-40 1 6.30e+00
41 3.170451e+05 -2.23e+02 1.00e-41 1 6.41e+00
41 3.170451e+05 -2.23e+02 1.00e-40 1 6.29e+00
41 3.170451e+05 -2.23e+02 1.00e-39 1 6.85e+00
41 3.170451e+05 -2.23e+02 1.00e-38 1 6.29e+00
41 3.170451e+05 -2.23e+02 1.00e-37 1 6.32e+00
41 3.170451e+05 -2.23e+02 1.00e-36 1 6.96e+00
41 3.170451e+05 -2.23e+02 1.00e-35 1 6.23e+00
41 3.170451e+05 -2.23e+02 1.00e-34 1 6.44e+00
41 3.170451e+05 -2.23e+02 1.00e-33 1 6.39e+00
41 3.170451e+05 -2.23e+02 1.00e-32 1 6.28e+00
41 3.170451e+05 -2.23e+02 1.00e-31 1 6.31e+00
41 3.170451e+05 -2.23e+02 1.00e-30 1 6.31e+00
41 3.170451e+05 -2.23e+02 1.00e-29 1 6.29e+00
41 3.170451e+05 -2.23e+02 1.00e-28 1 6.73e+00
41 3.170451e+05 -2.23e+02 1.00e-27 1 6.42e+00
41 3.170451e+05 -2.23e+02 1.00e-26 1 6.33e+00
41 3.170451e+05 -2.23e+02 1.00e-25 1 6.45e+00
41 3.170451e+05 -2.23e+02 1.00e-24 1 6.48e+00
41 3.170451e+05 -2.23e+02 1.00e-23 1 6.36e+00
41 3.170451e+05 -2.23e+02 1.00e-22 1 6.99e+00
41 3.170451e+05 -2.23e+02 1.00e-21 1 6.66e+00
41 3.170451e+05 -2.23e+02 1.00e-20 1 6.35e+00
41 3.170451e+05 -2.23e+02 1.00e-19 1 6.46e+00
41 3.170451e+05 -2.23e+02 1.00e-18 1 6.33e+00
41 3.170451e+05 -2.23e+02 1.00e-17 1 6.70e+00
41 3.170451e+05 -2.23e+02 1.00e-16 1 6.46e+00
41 3.170451e+05 -2.23e+02 1.00e-15 1 6.32e+00
41 3.170451e+05 -2.23e+02 1.00e-14 1 6.46e+00
41 3.170451e+05 -2.23e+02 1.00e-13 1 6.64e+00
41 3.170451e+05 -2.23e+02 1.00e-12 1 6.54e+00
41 3.170451e+05 -2.23e+02 1.00e-11 1 6.64e+00
41 3.170451e+05 -2.23e+02 1.00e-10 1 6.92e+00
41 3.170451e+05 -2.23e+02 1.00e-09 1 6.32e+00
41 3.170451e+05 -2.23e+02 1.00e-08 1 6.32e+00
41 3.170451e+05 -2.23e+02 1.00e-07 1 6.94e+00
41 3.170451e+05 -2.23e+02 1.00e-06 1 6.47e+00
41 3.170451e+05 -2.23e+02 1.00e-05 1 6.25e+00
41 3.170451e+05 -2.23e+02 1.00e-04 1 6.38e+00
41 3.170453e+05 -2.23e+02 1.00e-03 1 6.65e+00
41 3.170461e+05 -2.24e+02 1.00e-02 1 6.64e+00
41 3.170495e+05 -2.27e+02 1.00e-01 1 6.57e+00
41 3.170555e+05 -2.33e+02 1.00e+00 1 6.87e+00
41 3.170339e+05 -2.11e+02 1.00e+01 1 6.69e+00
41 3.168520e+05 -2.94e+01 1.00e+02 1 6.92e+00
41 3.166480e+05 1.75e+02 1.00e+03 1 6.35e+00
42 3.167173e+05 -6.93e+01 1.00e+02 1 6.27e+00
42 3.165536e+05 9.44e+01 1.00e+03 1 6.96e+00
43 3.166009e+05 -4.73e+01 1.00e+02 1 6.24e+00
43 3.164882e+05 6.54e+01 1.00e+03 1 6.33e+00
44 3.164915e+05 -3.29e+00 1.00e+02 1 6.30e+00
44 3.164317e+05 5.65e+01 1.00e+03 1 6.34e+00
45 3.163865e+05 4.52e+01 1.00e+02 1 6.31e+00
46 3.161813e+05 2.05e+02 1.00e+01 1 6.80e+00
47 3.158449e+05 3.36e+02 1.00e+00 1 6.36e+00
48 3.156355e+05 2.09e+02 1.00e-01 1 6.37e+00
49 3.156558e+05 -2.04e+01 1.00e-02 1 6.36e+00
49 3.156612e+05 -2.57e+01 1.00e-01 1 6.35e+00
49 3.156691e+05 -3.36e+01 1.00e+00 1 6.26e+00
49 3.156711e+05 -3.56e+01 1.00e+01 1 6.68e+00
49 3.156445e+05 -9.00e+00 1.00e+02 1 6.38e+00
49 3.156126e+05 2.28e+01 1.00e+03 1 6.32e+00
50 3.156550e+05 -4.23e+01 1.00e+02 1 6.34e+00
50 3.156108e+05 1.80e+00 1.00e+03 1 6.32e+00
Optimization complete
initial error=9.52809e+07
final error=315611
I think the plan should be to call iterate in python, and get the error explicitly after each call. Parsing the output seems clunky.
I splitted optimize() into iterate() https://colab.research.google.com/drive/1w8hQ1v-HtXna--BQHbFQSHqf2-mEcAVF will deal with comet.ml tomorrow
This is interesting. Based on various comments and observations, I reran levenberg marquardt on these datasets under the following settings (C++):
i) pose3exp_map/rot3_expmap compiler flags: OFF , chordal initialization: OFF ii) pose3exp_map/rot3_expmap compiler flags: ON , chordal initialization: OFF iii) pose3exp_map/rot3_expmap compiler flags: OFF , chordal initialization: ON iv)pose3exp_map/rot3_expmap compiler flags: ON , chordal initialization: ON
When I visualize the results at convergence: ii) looks better than i) iii) and iv) both look better than ii) iii) and iv) are nearly identical in terms of the quality of the result From what I notice, initialization based on chordal relaxation seems crucial to converging to meaningful results on these datasets. I'll share more detailed satistics about the solver itself, when I get a chance to.
Thanks
Description
I've been playing with GTSAM for doing SE(3) Pose Graph Optimization. As a starting point, I tried GTSAM on some benchmark datasets for PGO from this site: https://lucacarlone.mit.edu/datasets/ Unfortunately, I haven't been able to achieve convergence to any meaningful results on Sphere, Torus and Cube. It works well for Parking Garage and Cubicle. The issue seems to be convergence to a really bad local minimum. I've tried combining various Non Linear Optimizers (Levenberg Marquardt, Gauss-Newton and Dogleg) with linear solver (Cholesky and QR) that come with GTSAM, but the optimizer always converges to a bad minimum in a handful of iterations (typically < 10), either because the absolute or the relative error was less than the threshold (which I believe is around 1e-7). I've also played around with damping strategies for LM to see if that helps (not setting the damping factor to a fixed value, allowing for off-diagonals to also be damped, etc), without any luck. Has anyone been able to achieve meaningful results on these datasets? If so, would appreciate if you share the relevant code snippet. Thanks.
Steps to reproduce
Expected behavior
Environment
Ubuntu 16.04 LTS
Additional information