Some questions about the implementation and comparison to other solvers

hitdshu commented 2 years ago

Hi,

Thanks for your very excellent work. I have some questions about the current implementation and the performance.

rootba/ceres use the jacobian squared sum to scale the problem data before solve them. But in g2o and srrg2_solver, they seem to not scale them at all. Are these necessary and to what extent are they necessary?
It is understood that trust region(dogleg) type algorithm is more efficient than LM algorithm, as it only needs to factor the big sparse matrix only once in a iteration. However, almost all solvers use LM in their default settings. For ceres, the dogleg method is actually slower than LM in my experiments. Could you please give some reasons about this?
In a recent paper, https://github.com/srrg-sapienza/srrg2_solver , the author there tests several solvers, and one can conclude that ceres is very unefficient from their experiments. With single thread, rootba might have similar performance to ceres and hence might be inefficient as well..... It would be great if you could provide some clues...

Thanks again for your excellent works, like rootba/basalt, etc...

Have a nice day, Deshun

NikolausDemmel commented 2 years ago

Thanks for your interest and thoughtful questions. I don't have very clear answers, unfortunately, but you can find my thoughts below.

1. rootba/ceres use the jacobian squared sum to scale the problem data before solve them. But in g2o and srrg2_solver, they seem to not scale them at all. Are these necessary and to what extent are they necessary?

Hard to say in general. By scaling the Jacobian columns, you should improve (reduce) the condition number of the linear system you are solving. To my understanding, a large condition number can lead to numerical inaccuracies, but it depends on the exact scenario. So if you have a badly conditioned problem that is very large and maybe you even compute with reduced floating point precision, it can have more of an impact. If Jacobian columns have norm 1, then the Hessian diagonal has all ones, so it can also affect the relative weighting of different dimensions in the LM damping, depending on the scheme you use.

In rootba we found for large BA problems that Jacobian scaling improves the accuracy of results (final cost achieved). The additional runtime cost is minor in this case. In both our implementation and Ceres you there is an option to disable it, so you can also experiment yourself. On the other hand. in our followup work rootvo we didn't see a benefit and in the end removed Jacobian scaling.

2. It is understood that trust region(dogleg) type algorithm is more efficient than LM algorithm, as it only needs to factor the big sparse matrix only once in a iteration. However, almost all solvers use LM in their default settings. For ceres, the dogleg method is actually slower than LM in my experiments. Could you please give some reasons about this?

I haven't really experimented with dogleg myself. To my understanding it's an alternative to the damping in LM to compute the update from the GN-step and gradient direction. My knowledge is superficial here, but what I remember is not so much that it's more efficient than LM (both solve one big sparse linear system per outer iteration IIRC), but that it might in some cases lead to better descend direction, so it might in some cases find better optima / get less stuck in local minima.

3. In a recent paper, https://github.com/srrg-sapienza/srrg2_solver , the author there tests several solvers, and one can conclude that ceres is very unefficient from their experiments. With single thread, rootba might have similar performance to ceres and hence might be inefficient as well..... It would be great if you could provide some clues...

I haven't seen this, looks interesting. Thanks for pointing it out. At a glance, it seems they focus on slightly different problem types and not BA. At least I didn't notice and BA problems in the evaluation. Ceres is a very mature and general purpose solver with many many options. Depending on the problem type, certain options will be MUCH more efficient than others. Also, Ceres is very flexible, but doesn't specialize in any of these scenarios, so it doesn't surprise me that a dedicated solver can outperform Ceres (just as a dedicated BA solver can outperform Ceres on BA problems). For example, while I'm sure srrg2 is very efficient for the use cases it's been designed for, it seems like srrg2 doesn't implement Schur complement, so I can't imagine that it would perform well on large-scale BA problems. There is a Ceres-schur option in the paper, but even there, Ceres has different configurations that will perform very differently depending on the problem structure / size (explicit Schur with dense or sparse RCS storage, with iterative (PCG) or direct linear solver (Cholesky), or even PCG with implicit Schur complement). Moreover, we found that when comparing solver runtime and accuracy, implementation details of things like damping strategy, inner iteration count (for PCG) and abort criteria matter a lot. So without knowing these details it's hard to tell how representative an evaluation is for the general case (which is also why in our implementation we try to follow Ceres as much as possible in these implementation details).

But yes, I believe the rootba solver is in particular strong in taking advantage of multi-core parallelism, compared to the schur complement, and you can also see that also our own implementation of SC, not just Ceres. More experimentation in this direction would be interesting for sure.

hitdshu commented 2 years ago

Thanks very much for your timely reply even in your holiday...

I think I've got the right answers for the first two questions.

You're right. Ceres is much more powerful than other solvers and has tons of options to tune for a better performance. And rootba is a very interesting and powerful method.

Happy new year, Deshun

NikolausDemmel / rootba

Some questions about the implementation and comparison to other solvers #5