ceres-solver / ceres-solver

A large scale non-linear optimization library
http://ceres-solver.org/
Other
3.87k stars 1.04k forks source link

Performance decreasing in CERES Solver v2.2.0 #1102

Closed esaumar closed 1 week ago

esaumar commented 1 month ago

Hi! I’m trying to upgrade from CERES 1.14.0 to CERES 2.2.0 (including and Ubuntu and PCL upgrade) but I’m seeing worse performance. It is increasing like 12-13% the processing time. I’m using some dependencies like PCL, OpenMVG, OpenCV. I upgraded to CERES 2.0.0 without upgrading Ubuntu and it has similar behavior as my baseline (see below). For reference, previously I created this issue #1063.

Baseline using CERES 1.14.0

Upgrading to CERES 2.0.0

Upgrading to CERES 2.2.0

I upgraded to Ubuntu 22.04 due to the requirement of using C++17 for CERES 2.2.0.

These are the options I’m using

ceres::Solver::Options options;
      options.max_num_iterations = 40;
      options.preconditioner_type = ceres::JACOBI;
      options.sparse_linear_algebra_library_type = ceres::SUITE_SPARSE;
      options.linear_solver_type = ceres::SPARSE_SCHUR;
      options.trust_region_strategy_type = ceres::LEVENBERG_MARQUARDT;

And I’m using AutoDiffCostFunction

Can you give me a hint of what could be happening in this case?

sandwichmaker commented 1 month ago

This is not enough information. May different things or libraries could have changed things.

At minimum please report the output of summary::fullreport() for the same problem and we can see where the time is going.

On Fri, Sep 20, 2024, 12:18 PM esaumar @.***> wrote:

Hi! I’m trying to upgrade from CERES 1.14.0 to CERES 2.2.0 (including and Ubuntu and PCL upgrade) but I’m seeing worse performance. It is increasing like 12-13% the processing time. I’m using some dependencies like PCL, OpenMVG, OpenCV. I upgraded to CERES 2.0.0 without upgrading Ubuntu and it has similar behavior as my baseline (see below). For reference, previously I created this issue #1063 https://github.com/ceres-solver/ceres-solver/issues/1063.

Baseline using CERES 1.14.0

  • Ubuntu 18.04
  • PCL 1.8.0
  • OpenCV 4.1.1
  • OpenMVG 1.2

Upgrading to CERES 2.0.0

  • Ubuntu 18.04
  • PCL 1.8.0
  • OpenCV 4.1.1
  • OpenMVG 1.2

Upgrading to CERES 2.2.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1
  • OpenMVG 1.2

I upgraded to Ubuntu 22.04 due to the requirement of using C++17 for CERES 2.2.0.

These are the options I’m using

ceres::Solver::Options options; options.max_num_iterations = 40; options.preconditioner_type = ceres::JACOBI; options.sparse_linear_algebra_library_type = ceres::SUITE_SPARSE; options.linear_solver_type = ceres::SPARSE_SCHUR; options.trust_region_strategy_type = ceres::LEVENBERG_MARQUARDT;

And I’m using AutoDiffCostFunction

Can you give me a hint of what could be happening in this case?

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABJUKVEKVL5C2EVBQHLZXRYJLAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUZTSNJQGU2TENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

esaumar commented 1 month ago

Sure @sandwichmaker. Thanks for your answer. Btw, this is a problem of optimizing a 3D point cloud obtained by an SfM engine. Here you have the output for the same problem.

Baseline using CERES 1.14.0

Refining tile with 26765 points 
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                     Original                  Reduced
Parameter blocks                        26765                    26765
Parameters                              26765                    26765
Residual blocks                         53530                    53530
Residuals                             2087670                  2087670

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC               7327,19438
Schur structure                         d,1,1                    d,d,d

Cost:
Initial                          5.282318e+08
Final                            4.079526e+08
Change                           1.202792e+08

Minimizer iterations                       21
Successful steps                           21
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.204571

  Residual only evaluation           4.870745 (20)
  Jacobian & residual evaluation     6.838206 (21)
  Linear solver                      1.726577 (20)
Minimizer                           14.012495

Postprocessor                        0.004029
Total                               14.221096

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

Upgrading to CERES 2.0.0

Refining tile with 26765 points 

Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-no_openmp)

                                     Original                  Reduced
Parameter blocks                        26765                    26765
Parameters                              26765                    26765
Residual blocks                         53530                    53530
Residuals                             2087670                  2087670

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC               7327,19438
Schur structure                         d,1,1                    d,d,d

Cost:
Initial                          5.282318e+08
Final                            4.079526e+08
Change                           1.202792e+08

Minimizer iterations                       21
Successful steps                           21
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.184625

  Residual only evaluation           5.006680 (20)
  Jacobian & residual evaluation     7.065109 (21)
  Linear solver                      1.721030 (20)
Minimizer                           14.369807

Postprocessor                        0.005544
Total                               14.559977

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

Upgrading to CERES 2.2.0

Refining tile with 26605 points 

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                     Original                  Reduced
Parameter blocks                        26605                    26605
Parameters                              26605                    26605
Residual blocks                         53210                    53210
Residuals                             2075190                  2075190

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
Sparse linear algebra library    SUITE_SPARSE + AMD 

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC               7296,19309
Schur structure                         d,1,1                    d,d,d

Cost:
Initial                          5.221692e+08
Final                            3.998656e+08
Change                           1.223036e+08

Minimizer iterations                       21
Successful steps                           21
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.180470

  Residual only evaluation           4.888542 (20)
  Jacobian & residual evaluation    10.950721 (21)
  Linear solver                      1.442545 (20)
Minimizer                           17.889433

Postprocessor                        0.004414
Total                               18.074318

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

I created another case where I used CERES Solver 2.0.0 but using Ubuntu 22.04

Upgrading to CERES 2.0.0 (Ubuntu 22.04)

Refining tile with 26599 points 

Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp)

                                     Original                  Reduced
Parameter blocks                        26599                    26599
Parameters                              26599                    26599
Residual blocks                         53198                    53198
Residuals                             2074722                  2074722

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     1                        1
Linear solver ordering              AUTOMATIC               7276,19323
Schur structure                         d,1,1                    d,d,d

Cost:
Initial                          5.243282e+08
Final                            4.030508e+08
Change                           1.212773e+08

Minimizer iterations                       21
Successful steps                           21
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.157499

  Residual only evaluation           5.084708 (20)
  Jacobian & residual evaluation    12.151767 (21)
  Linear solver                      1.567164 (20)
Minimizer                           19.334076

Postprocessor                        0.003865
Total                               19.495441

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

P.S. The maximum number of iterations is 20, not 40.

sandwichmaker commented 1 month ago

Thanks this is informative. All the time increase seems to be in the Jacobian evaluation. Which is indeed surprising.

Would it be possible for you to also report results for Ceres 2.2.0?

Ceres 2.0.0 is multiple years old now.

sandwichmaker commented 1 month ago

Sorry you have provided what I asked for already. I scrolled too fast. Jacobian evaluation slowing down is not something I would have guessed. Let me see if I can replicate this on my end.

On Mon, Sep 23, 2024, 9:33 PM esaumar @.***> wrote:

Sure @sandwichmaker https://github.com/sandwichmaker. Thanks for your answer. Btw, this is a problem of optimizing a 3D point cloud obtained by an SfM engine. Here you have the output for the same problem.

Baseline using CERES 1.14.0

  • Ubuntu 18.04
  • PCL 1.8.0
  • OpenCV 4.1.1
  • OpenMVG 1.2

Refining tile with 26765 points Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                 Original                  Reduced

Parameter blocks 26765 26765 Parameters 26765 26765 Residual blocks 53530 53530 Residuals 2087670 2087670

Minimizer TRUST_REGION

Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7327,19438 Schur structure d,1,1 d,d,d

Cost: Initial 5.282318e+08 Final 4.079526e+08 Change 1.202792e+08

Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0

Time (in seconds): Preprocessor 0.204571

Residual only evaluation 4.870745 (20) Jacobian & residual evaluation 6.838206 (21) Linear solver 1.726577 (20) Minimizer 14.012495

Postprocessor 0.004029 Total 14.221096

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

Upgrading to CERES 2.0.0

  • Ubuntu 18.04
  • PCL 1.8.0
  • OpenCV 4.1.1
  • OpenMVG 1.2

Refining tile with 26765 points

Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-no_openmp)

                                 Original                  Reduced

Parameter blocks 26765 26765 Parameters 26765 26765 Residual blocks 53530 53530 Residuals 2087670 2087670

Minimizer TRUST_REGION

Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7327,19438 Schur structure d,1,1 d,d,d

Cost: Initial 5.282318e+08 Final 4.079526e+08 Change 1.202792e+08

Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0

Time (in seconds): Preprocessor 0.184625

Residual only evaluation 5.006680 (20) Jacobian & residual evaluation 7.065109 (21) Linear solver 1.721030 (20) Minimizer 14.369807

Postprocessor 0.005544 Total 14.559977

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

Upgrading to CERES 2.2.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1
  • OpenMVG 1.2

Refining tile with 26605 points

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                 Original                  Reduced

Parameter blocks 26605 26605 Parameters 26605 26605 Residual blocks 53210 53210 Residuals 2075190 2075190

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Sparse linear algebra library SUITE_SPARSE + AMD

                                    Given                     Used

Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7296,19309 Schur structure d,1,1 d,d,d

Cost: Initial 5.221692e+08 Final 3.998656e+08 Change 1.223036e+08

Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0

Time (in seconds): Preprocessor 0.180470

Residual only evaluation 4.888542 (20) Jacobian & residual evaluation 10.950721 (21) Linear solver 1.442545 (20) Minimizer 17.889433

Postprocessor 0.004414 Total 18.074318

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

I created another case where I used CERES Solver 2.0.0 but using Ubuntu 22.04

Upgrading to CERES 2.0.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1
  • OpenMVG 1.2

Refining tile with 26599 points

Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp)

                                 Original                  Reduced

Parameter blocks 26599 26599 Parameters 26599 26599 Residual blocks 53198 53198 Residuals 2074722 2074722

Minimizer TRUST_REGION

Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7276,19323 Schur structure d,1,1 d,d,d

Cost: Initial 5.243282e+08 Final 4.030508e+08 Change 1.212773e+08

Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0

Time (in seconds): Preprocessor 0.157499

Residual only evaluation 5.084708 (20) Jacobian & residual evaluation 12.151767 (21) Linear solver 1.567164 (20) Minimizer 19.334076

Postprocessor 0.003865 Total 19.495441

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)

P.S. The maximum number of iterations is 20, not 40.

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2370130879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABOQPMZ5BZPVSHIKW7DZYDTQ5AVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGEZTAOBXHE . You are receiving this because you were mentioned.Message ID: @.***>

sandwichmaker commented 1 month ago

So here is my attempt at replicating your results on my mac. I ran the bundle adjuster just one iteration to see how long it takes to evaluate the Jacobian/residuals. I used bundle_adjuster with problem-1778-993923-pre.txt

1.14.0

bundle_adjuster  --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1 -ordering user

iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    1.72e+00    2.74e+00
   1  1.435514e+07    2.42e+08    3.18e+14   7.27e+05   9.51e-01  3.00e+04        5    2.97e+00    5.71e+00

Solver Summary (v 1.14.0-eigen-(3.4.0)-lapack-cxsparse-(4.4.1)-eigensparse-no_openmp-no_tbb)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         1.019843

  Residual only evaluation           0.207327 (1)
  Jacobian & residual evaluation     2.385012 (2)
  Linear solver                      1.412146 (1)
Minimizer                            4.711149

Postprocessor                        0.051827
Total                                5.782819

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

2.2.0

bundle_adjuster --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    1.91e+00    2.87e+00
   1  1.435514e+07    2.42e+08    3.18e+14   0.00e+00   9.51e-01  3.00e+04        5    2.91e+00    5.78e+00

Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-metis-(5.1.0)-acceleratesparse-eigensparse)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.960292

  Residual only evaluation           0.241840 (1)
  Jacobian & residual evaluation     2.428891 (2)
  Linear solver                      1.380378 (1)
Minimizer                            4.848096

Postprocessor                        0.055693
Total                                5.864081

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

HEAD

bundle_adjuster --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    1.68e+00    2.61e+00
   1  1.435514e+07    2.42e+08    3.18e+14   0.00e+00   9.51e-01  3.00e+04        5    2.84e+00    5.45e+00

Solver Summary (v 2.3.0-eigen-(3.4.0)-lapack-suitesparse-(7.8.1)-metis-(5.1.0)-acceleratesparse-eigensparse)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         0.928962

  Residual only evaluation           0.213914 (1)
  Jacobian & residual evaluation     2.281093 (2)
  Linear solver                      1.401529 (1)
Minimizer                            4.548268

Postprocessor                        0.050395
Total                                5.527625

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

The relevant times are

1.14.0  Residual  0.207327 (1) Jacobian & residual   2.385012 (2)
2.2.0   Residual  0.241840 (1) Jacobian  & residual  2.428891 (2)
HEAD  Residual  0.213914 (1) Jacobian & residual   2.281093 (2)

I am not seeing much of a variation. There is some up and down but I do not see any significant changes that look like the changes you are seeing.

Which makes me wonder if the performance of AutoDiffCostFunction you are using has changed across versions of ceres.

esaumar commented 1 month ago

I also replicated what you did with some of my cases and I got the same behavior than you. Actually, with multithreads 2.2.0 has better performance. Thanks for your help @sandwichmaker. I'll take a look on the functors I'm using for AutoDiffCostFunction.

Baseline using CERES 1.14.0

Single thread

bundle_adjuster  --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1 -ordering user
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    3.02e+00    5.96e+00
   1  1.435514e+07    2.42e+08    3.18e+14   7.27e+05   9.51e-01  3.00e+04        5    6.11e+00    1.21e+01

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         2.944793

  Residual only evaluation           0.580917 (1)
  Jacobian & residual evaluation     4.827219 (2)
  Linear solver                      2.654686 (1)
Minimizer                            9.215678

Postprocessor                        0.201411
Total                               12.361882

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Multithread

bundle_adjuster  --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 8 -ordering user
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    1.27e+00    4.32e+00
   1  1.435514e+07    2.42e+08    3.18e+14   7.27e+05   9.51e-01  3.00e+04        5    3.96e+00    8.28e+00

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     8                        8
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         3.043149

  Residual only evaluation           0.139151 (1)
  Jacobian & residual evaluation     1.342911 (2)
  Linear solver                      2.686629 (1)
Minimizer                            5.326932

Postprocessor                        0.204278
Total                                8.574359

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Upgrading to CERES 2.2.0

Single thread

bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    3.33e+00    6.20e+00
   1  1.435514e+07    2.42e+08    3.18e+14   0.00e+00   9.51e-01  3.00e+04        5    6.30e+00    1.25e+01

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         2.865450

  Residual only evaluation           0.454072 (1)
  Jacobian & residual evaluation     5.528297 (2)
  Linear solver                      2.609310 (1)
Minimizer                            9.728248

Postprocessor                        0.206709
Total                               12.800407

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Multithread

bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 8
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    8.62e-01    3.86e+00
   1  1.435514e+07    2.42e+08    3.18e+14   0.00e+00   9.51e-01  3.00e+04        5    2.56e+00    6.42e+00

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     8                        8
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         2.994667

  Residual only evaluation           0.106352 (1)
  Jacobian & residual evaluation     1.010054 (2)
  Linear solver                      1.621585 (1)
Minimizer                            3.514487

Postprocessor                        0.209547
Total                                6.718701

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
sandwichmaker commented 1 month ago

Yes multi threaded performance has improved substantially, but I noticed that single threaded residual and Jacobian performance did get worse in your experiments.

Was that a fluke or is that reliably reproducible?

Are you able to build 1.14 on the same version of Ubuntu with the same tool chain?

On Tue, Sep 24, 2024, 6:54 PM esaumar @.***> wrote:

I also replicated what you did with some of my cases and I got the same behavior than you. Actually, with multithreads 2.2.0 has better performance. Thanks for your help @sandwichmaker https://github.com/sandwichmaker. I'll take a look on the functors I'm using for AutoDiffCostFunction.

Baseline using CERES 1.14.0

  • Ubuntu 18.04
  • PCL 1.8.0
  • OpenCV 4.1.1
  • OpenMVG 1.2

Single thread

bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.02e+00 5.96e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 6.11e+00 1.21e+01

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 2.944793

Residual only evaluation 0.580917 (1) Jacobian & residual evaluation 4.827219 (2) Linear solver 2.654686 (1) Minimizer 9.215678

Postprocessor 0.201411 Total 12.361882

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Multithread

bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.27e+00 4.32e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 3.96e+00 8.28e+00

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 8 8 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 3.043149

Residual only evaluation 0.139151 (1) Jacobian & residual evaluation 1.342911 (2) Linear solver 2.686629 (1) Minimizer 5.326932

Postprocessor 0.204278 Total 8.574359

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Upgrading to CERES 2.2.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1
  • OpenMVG 1.2

Single thread

bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.33e+00 6.20e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.25e+01

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 2.865450

Residual only evaluation 0.454072 (1) Jacobian & residual evaluation 5.528297 (2) Linear solver 2.609310 (1) Minimizer 9.728248

Postprocessor 0.206709 Total 12.800407

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Multithread

bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 8.62e-01 3.86e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 2.56e+00 6.42e+00

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 8 8 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 2.994667

Residual only evaluation 0.106352 (1) Jacobian & residual evaluation 1.010054 (2) Linear solver 1.621585 (1) Minimizer 3.514487

Postprocessor 0.209547 Total 6.718701

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2372723778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABNJCRLUIAITYOQM7ILZYIJWJAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZSG4ZDGNZXHA . You are receiving this because you were mentioned.Message ID: @.***>

esaumar commented 1 month ago

It is reliably reproducible. Let me share with you some data. I made 5 runs for every case and this is what I got. Anyway, I'll try to build 1.14.0 on the same version of Ubuntu and will share the data with you.

P.S. I added some side notes to each case regarding the CERES Solver version used by OpenMVG. Just in case it might be helpful. Not sure if this might create a conflict due to multiple installed versions but the linked libraries for my project in CMake are correct. The Baseline is the case where only one CERES versions was compiled.

Baseline using CERES 1.14.0

 Residual only evaluation           0.603661 (1)
 Jacobian & residual evaluation     4.900541 (2)

Residual only evaluation           0.596503 (1)
  Jacobian & residual evaluation     4.914723 (2)

 Residual only evaluation           0.600968 (1)
  Jacobian & residual evaluation     4.957335 (2)

 Residual only evaluation           0.598945 (1)
  Jacobian & residual evaluation     4.905056 (2)

 Residual only evaluation           0.598227 (1)
  Jacobian & residual evaluation     4.883886 (2)

Upgrading to CERES 2.2.0



Here you have the results for the other cases that I shared in the previous messages.

Upgrading to CERES 2.0.0 (Ubuntu 18.04)

Upgrading to CERES 2.0.0 (Ubuntu 22.04)

esaumar commented 1 month ago

@sandwichmaker, I ran the tests you suggested. I didn't installed OpenMVG and only installed the CERES versions on Ubuntu 22.04 (one independent docker image for each version). PCL and OpenCV are the same versions for both cases (even thought we are not using them for these tests). I still see that CERES 2.2.0 is taking more time on the Jacobian & Residual evaluation for single thread.

Btw, I'm using

Case 1 Baseline using CERES 1.14.0

 Residual only evaluation           0.454186 (1)
  Jacobian & residual evaluation     4.441791 (2)

 Residual only evaluation           0.457184 (1)
  Jacobian & residual evaluation     4.374026 (2)

 Residual only evaluation           0.447578 (1)
  Jacobian & residual evaluation     4.305764 (2)

 Residual only evaluation           0.444952 (1)
  Jacobian & residual evaluation     4.536934 (2)

  Residual only evaluation           0.444257 (1)
  Jacobian & residual evaluation     4.380742 (2)

Case 2 Upgrading to CERES 2.2.0

sandwichmaker commented 1 month ago

Okay that's an apple to apple comparison. I am assuming these numbers are from examples/bundle_adjuster?

On Wed, Sep 25, 2024, 6:21 PM esaumar @.***> wrote:

@sandwichmaker https://github.com/sandwichmaker, I ran the tests you suggested. I didn't installed OpenMVG and only installed the CERES versions on Ubuntu 22.04 (one independent docker image for each version). PCL and OpenCV are the same versions for both cases (even thought we are not using them for these tests). I still see that CERES 2.2.0 is taking more time on the Jacobian & Residual evaluation for single thread.

Btw, I'm using

  • cmake 3.16.9
  • eigen 3.3.4
  • glog 0.4.0
  • gflags 2.2.2
  • SuiteSparse 5.10.1
  • BLAS 3.10.0
  • LAPACK 3.10.0

Case 1 Baseline using CERES 1.14.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1

    Residual only evaluation 0.454186 (1) Jacobian & residual evaluation 4.441791 (2)

    Residual only evaluation 0.457184 (1) Jacobian & residual evaluation 4.374026 (2)

    Residual only evaluation 0.447578 (1) Jacobian & residual evaluation 4.305764 (2)

    Residual only evaluation 0.444952 (1) Jacobian & residual evaluation 4.536934 (2)

    Residual only evaluation 0.444257 (1) Jacobian & residual evaluation 4.380742 (2)

Case 2 Upgrading to CERES 2.2.0

  • Ubuntu 22.04
  • PCL 1.9.1
  • OpenCV 4.1.1

Residual only evaluation 0.460441 (1) Jacobian & residual evaluation 5.557999 (2)

Residual only evaluation 0.453982 (1) Jacobian & residual evaluation 5.531522 (2)

Residual only evaluation 0.455714 (1) Jacobian & residual evaluation 5.585474 (2)

Residual only evaluation 0.469980 (1) Jacobian & residual evaluation 5.582852 (2)

Residual only evaluation 0.463965 (1) Jacobian & residual evaluation 5.595356 (2)

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2375550945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABOC5GV6HLBXWB7GPEDZYNOSRAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGU2TAOJUGU . You are receiving this because you were mentioned.Message ID: @.***>

esaumar commented 1 month ago

Yes, I ran bundle_adjuster like this:

Case 1 Baseline using CERES 1.14.0

bundle_adjuster  --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1 -ordering user
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    2.79e+00    6.00e+00
   1  1.435514e+07    2.42e+08    3.18e+14   7.27e+05   9.51e-01  3.00e+04        5    5.74e+00    1.17e+01

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-openmp-no_tbb)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         3.211934

  Residual only evaluation           0.442265 (1)
  Jacobian & residual evaluation     4.376956 (2)
  Linear solver                      2.660337 (1)
Minimizer                            8.619866

Postprocessor                        0.203518
Total                               12.035319

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Case 2 Upgrading to CERES 2.2.0

bundle_adjuster  --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur  -num_threads 1               
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  2.563973e+08    0.00e+00    3.19e+15   0.00e+00   0.00e+00  1.00e+04        0    3.34e+00    6.36e+00
   1  1.435514e+07    2.42e+08    3.18e+14   0.00e+00   9.51e-01  3.00e+04        5    6.30e+00    1.27e+01

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                     Original                  Reduced
Parameter blocks                       995701                   995701
Parameters                            2997771                  2997771
Residual blocks                       5001946                  5001946
Residuals                            10003892                 10003892

Minimizer                        TRUST_REGION
Trust region strategy     LEVENBERG_MARQUARDT
                                        Given                     Used
Linear solver                 ITERATIVE_SCHUR          ITERATIVE_SCHUR
Preconditioner                         JACOBI                   JACOBI
Threads                                     1                        1
Linear solver ordering            993923,1778              993923,1778
Schur structure                         2,3,9                    2,3,9

Cost:
Initial                          2.563973e+08
Final                            1.435514e+07
Change                           2.420421e+08

Minimizer iterations                        2
Successful steps                            2
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                         3.021223

  Residual only evaluation           0.452501 (1)
  Jacobian & residual evaluation     5.535506 (2)
  Linear solver                      2.611305 (1)
Minimizer                            9.736808

Postprocessor                        0.209909
Total                               12.967940

Termination:                   NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
sandwichmaker commented 1 month ago

Okay I have a Linux box, let me try and replicate the numbers on my end and then try and bisect to see what's going on. I will keep you posted on my progress.

On Wed, Sep 25, 2024, 6:49 PM esaumar @.***> wrote:

Yes, I ran bundle_adjuster like this:

Case 1 Baseline using CERES 1.14.0

bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 2.79e+00 6.00e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 5.74e+00 1.17e+01

Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-openmp-no_tbb)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT

                                    Given                     Used

Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 3.211934

Residual only evaluation 0.442265 (1) Jacobian & residual evaluation 4.376956 (2) Linear solver 2.660337 (1) Minimizer 8.619866

Postprocessor 0.203518 Total 12.035319

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

Case 2 Upgrading to CERES 2.2.0

bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.34e+00 6.36e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.27e+01

Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)

                                 Original                  Reduced

Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892

Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9

Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08

Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0

Time (in seconds): Preprocessor 3.021223

Residual only evaluation 0.452501 (1) Jacobian & residual evaluation 5.535506 (2) Linear solver 2.611305 (1) Minimizer 9.736808

Postprocessor 0.209909 Total 12.967940

Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2375583846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABIRNYFE7H5VNNMQ7TLZYNR2DAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGU4DGOBUGY . You are receiving this because you were mentioned.Message ID: @.***>

sandwichmaker commented 1 month ago

Okay, so I can't get a 25% increase like you are seeing, but I can see 10%.

On Debian Rodete with GCC 13.2, Eigen 3.4.0

Ceres Solver 1.14.0

./bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user

  Residual only evaluation           0.589166 (1)
  Jacobian & residual evaluation     5.313726 (2)

  Residual only evaluation           0.590412 (1)
  Jacobian & residual evaluation     5.345748 (2)

Ceres 2.0.0

./bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user

  Residual only evaluation           0.620252 (1)
  Jacobian & residual evaluation     6.008752 (2)

  Residual only evaluation           0.615519 (1)
  Jacobian & residual evaluation     6.033850 (2)

with Ceres Solver 2.2.0

/bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -linear_solver_ordering=user

  Residual only evaluation           0.633734 (1)
  Jacobian & residual evaluation     5.867421 (2)

  Residual only evaluation           0.658629 (1)
  Jacobian & residual evaluation     5.951328 (2)

and at HEAD today

/bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -linear_solver_ordering=user

 Residual only evaluation           0.576504 (1)
 Jacobian & residual evaluation     5.863398 (2)

 Residual only evaluation           0.581811 (1)
 Jacobian & residual evaluation     5.864341 (2)

So something happened between 1.14.0 and 2.0.0 where things went bad, and then things improved a bit but never quite got back to where they were at 1.14.0.

sandwichmaker commented 1 month ago

So the offending commit seems to be 8904fa48, where we have

  Residual only evaluation           0.595762 (1)
  Jacobian & residual evaluation     5.947797 (2)

The commit right before this is 18a464d4, where we have

  Residual only evaluation           0.603536 (1)
  Jacobian & residual evaluation     4.597426 (2)

8904fa48 changes how the Jets are initialized. The commit message is a bit misleading because it seems to be talking about inlining, but if you look at whats really going on

https://github.com/ceres-solver/ceres-solver/commit/8904fa48#diff-8df0dec0a75c9055799c37682ae51518c6bb256769567873eef3898bfd25a2a7

The function Make1stOrderPerturbation which used to have a statically sized loop is runrolled using template meta-programming instead, the expectation being that the compiler should be able to inline the whole computation and optimize it.

The fact that this is making performance worse seems to indicate that is not happening. Now I am not sure if this is a GCC thing or this happens on Clang/LLVM also. Since when I run the same two commits on my mac where the default compiler is clang then I get

8904fa48 
  Residual only evaluation           0.213228 (1)
  Jacobian & residual evaluation     1.976800 (2)

18a464d4 
  Residual only evaluation           0.230083 (1)
  Jacobian & residual evaluation     2.024981 (2)

which does not indicate any difference. So I am starting to wonder if the worse performance is GCC specific.

I was unable to figure out how to get the clang toolchain to work on my linux box with these versions of ceres, but since we can replicate these performance problems with GCC we should see if we can fix them.

sandwichmaker commented 1 month ago

Okay I was able to use Google's internal clang based toolchain as well as my mac's clang to verify that reverting this CL has no effect if Clang/LLVM is used as the compiler. Performance remains the same. However with GCC performance does become better. autodiff_benchmarks does go up and down some even with clang.

I also tried just modifying the existing template based implementation and forcing inlining and that did not do anything, so this really is an optimization pass/some kind of inlining difference between GCC and Clang.

I think we should treat this as a GCC bug/missed optimization.

sandwichmaker commented 1 month ago

Some more debugging. Previously I was using google's internal clang based toolchain, but now I can use the clang based toolchain on debian and the results are curious. In the following HEAD means ceres solver at HEAD today and HEAD + change means that I reverted the change in https://github.com/ceres-solver/ceres-solver/commit/8904fa4887ed7b3e6d110ad5a98efbc2df48595e.

gcc HEAD
  Residual only evaluation           0.581872 (1)
  Jacobian & residual evaluation     5.909532 (2)

gcc HEAD + change 
  Residual only evaluation           0.586720 (1)
  Jacobian & residual evaluation     4.791280 (2)

cmake ../ -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_FLAGS="-stdlib=libc++"

clang HEAD
  Residual only evaluation           0.560918 (1)
  Jacobian & residual evaluation     7.849475 (2)

clang HEAD + change 
  Residual only evaluation           0.571185 (1)
  Jacobian & residual evaluation     7.861784 (2)

As you can see, clang while having much worse performance seems to not be affected by this code change at all. Now if I add -march=x86-64-v3 to the compiler flags then we get

gcc HEAD
  Residual only evaluation           0.558229 (1)
  Jacobian & residual evaluation     4.343600 (2)

gcc HEAD + change
  Residual only evaluation           0.565759 (1)
  Jacobian & residual evaluation     3.064159 (2)

clang HEAD 
  Residual only evaluation           0.563153 (1)
  Jacobian & residual evaluation     3.367755 (2)

clang HEAD + change
  Residual only evaluation           0.555043 (1)
  Jacobian & residual evaluation     3.382892 (2)

GCC's performance gets better and is further improved by reverting this code path, and CLANG gets better but is unaffected by the code change.

I need to take a closer look at our autodiff benchmarks to see what is going on with them.

esaumar commented 1 month ago

@sandwichmaker, thanks for your support and for running all those tests. On my end, I ran another kind of apple to apple comparison. I ran the bundle_adjuster binary with CERES 2.0.0 but using different Ubuntu version (18.04 and 22.04).

The results are also curious. For the case of the bundler_adjuster example it took more time with Ubuntu 18.04 but for the case of my optimization problem it took more time with Ubuntu 22.04. Probably this is also an issue related to the gcc version. I'll try to run more tests on my end.

bundler_adjuster

CERES 2.0.0 (Ubuntu 18.04)

bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user

 Residual only evaluation           0.460584 (1)
  Jacobian & residual evaluation     6.957227 (2)

  Residual only evaluation           0.461226 (1)
  Jacobian & residual evaluation     6.968588 (2)

 Residual only evaluation           0.463042 (1)
  Jacobian & residual evaluation     6.960957 (2)

CERES 2.0.0 (Ubuntu 22.04)

bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user

 Residual only evaluation           0.454232 (1)
  Jacobian & residual evaluation     5.003663 (2)

 Residual only evaluation           0.447596 (1)
  Jacobian & residual evaluation     4.996817 (2)

  Residual only evaluation           0.468115 (1)
  Jacobian & residual evaluation     5.180313 (2)

My optimization problem (I took the numbers from my previous message)

CERES 2.0.0 (Ubuntu 18.04)

 Residual only evaluation           5.006680 (20)
  Jacobian & residual evaluation     7.065109 (21)

CERES 2.0.0 (Ubuntu 22.04)

  Residual only evaluation           5.084708 (20)
  Jacobian & residual evaluation    12.151767 (21)
sandwichmaker commented 1 month ago

okay how about you try reverting that commit in your local version of autodiff.h and run the same tests again?

esaumar commented 1 month ago

@sandwichmaker, I ran multiple tests reverting the commit and also using the compiler flag -march=x86-64-v3. For the tests I used CERES 2.0.0 in Ubuntu 22.04. I ran the bundler_adjuster example and my optimization problem on. In your repo examples the processing time was reduced significantly in the Jacobian & residual evaluation reverting the commit and with the compiler flag. On the other hand, with my project there is nothing conclusive at the moment but it seems that reverting the commit is helping reducing a bit the processing time in the Jacobian & residual evaluation but not even close to what I got in Ubuntu 18.04.

bundler_adjuster

CERES 2.0.0

Ubuntu 18.04
   Residual only evaluation           0.460584 (1)
   Jacobian & residual evaluation     6.957227 (2)

Ubuntu 22.04
   Residual only evaluation           0.454232 (1)
   Jacobian & residual evaluation     5.003663 (2)

Ubuntu 22.04 + reverted commit
   Residual only evaluation           0.444284 (1)
   Jacobian & residual evaluation     3.865306 (2)

Ubuntu 22.04 + reverted commit + march=x86-64-v3
   Residual only evaluation           0.440761 (1)
   Jacobian & residual evaluation     2.817200 (2)

Ubuntu 22.04 + march=x86-64-v3
   Residual only evaluation           0.432983 (1)
   Jacobian & residual evaluation     3.192775 (2)

My optimization problem

CERES 2.0.0

Ubuntu 18.04
   Residual only evaluation           5.006680 (20)
   Jacobian & residual evaluation     7.065109 (21)

Ubuntu 22.04
   Residual only evaluation           5.084708 (20)
   Jacobian & residual evaluation    12.151767 (21)

Ubuntu 22.04 + reverted commit
   Residual only evaluation           4.773392 (20)
   Jacobian & residual evaluation    11.242205 (21)

Ubuntu 22.04 + reverted commit + march=x86-64-v3
   Residual only evaluation           4.914351 (20)
   Jacobian & residual evaluation    11.501043 (21)

Ubuntu 22.04 + march=x86-64-v3
   Residual only evaluation           5.520689 (20)
   Jacobian & residual evaluation    12.702707 (21)

I ran another tests compiling my project with -march=x86-64-v3 as well

Ubuntu 22.04 + reverted commit + march=x86-64-v3
   Residual only evaluation           4.992464 (20)
   Jacobian & residual evaluation    11.706325 (21)

Ubuntu 22.04 + march=x86-64-v3
   Residual only evaluation           4.711005 (20)
   Jacobian & residual evaluation    11.148952 (21)

I'll try to run some tests using CLANG as well.

sandwichmaker commented 1 month ago

So now I am quite thoroughly confused. This regressions seems to be more about changes to the compiler between Ubuntu 18.04 -> Ubuntu 22.04, rather than anything to do with ceres really no? at least for your project.

esaumar commented 1 month ago

Yeah, that might be the case. Probably it is missing some compiler flag in Ubuntu 22.04 or something similar.

esaumar commented 1 month ago

I already ran multiple tests with CLANG as compiler. It seems that using CLANG reduces the processing time on your examples for some cases but reverting the commit is the change that has the biggest impact in performance. In my project, it behaves similar to using gcc.

I guess I'll look for the differences in the cmake files generated when compiling with gcc in Ubuntu 18.04 and 22.04 (in both, CERES and my project). Will let you know if I find any outstanding difference. Thanks!

bundler_adjuster

CERES 2.0.0

Ubuntu 18.04 with gcc
  Residual only evaluation           0.460584 (1)
  Jacobian & residual evaluation     6.957227 (2)

Ubuntu 18.04 with CLANG
  Residual only evaluation           0.436545 (1)
  Jacobian & residual evaluation     3.901897 (2)
*********************************************************
Ubuntu 22.04 with gcc
  Residual only evaluation           0.454232 (1)
  Jacobian & residual evaluation     5.003663 (2)

Ubuntu 22.04 with CLANG
  Residual only evaluation           0.426150 (1)
  Jacobian & residual evaluation     3.923545 (2)

*********************************************************
Ubuntu 22.04 + reverted commit with gcc
  Residual only evaluation           0.444284 (1)
  Jacobian & residual evaluation     3.865306 (2)

Ubuntu 22.04 + reverted commit with CLANG
  Residual only evaluation           0.419273 (1)
  Jacobian & residual evaluation     3.833912 (2)
*********************************************************
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with gcc
  Residual only evaluation           0.440761 (1)
  Jacobian & residual evaluation     2.817200 (2)

Ubuntu 22.04 + reverted commit + march=x86-64-v3 with CLANG
  Residual only evaluation           0.413516 (1)
  Jacobian & residual evaluation     2.722491 (2)
*********************************************************
Ubuntu 22.04 + march=x86-64-v3 with gcc
  Residual only evaluation           0.432983 (1)
  Jacobian & residual evaluation     3.192775 (2)

Ubuntu 22.04 + march=x86-64-v3 with CLANG
  Residual only evaluation           0.408792 (1)
  Jacobian & residual evaluation     2.758322 (2)

My optimization problem

CERES 2.0.0

Ubuntu 18.04 with gcc
   Residual only evaluation           5.006680 (20)
   Jacobian & residual evaluation     7.065109 (21)

Ubuntu 18.04 with CLANG
   Residual only evaluation           5.039767 (20)
   Jacobian & residual evaluation     7.082926 (21)
*********************************************************
Ubuntu 22.04 with gcc
   Residual only evaluation           5.084708 (20)
   Jacobian & residual evaluation    12.151767 (21)

Ubuntu 22.04 with CLANG
   Residual only evaluation           5.388123 (20)
   Jacobian & residual evaluation    12.389536 (21)
*********************************************************
Ubuntu 22.04 + reverted commit with gcc
   Residual only evaluation           4.773392 (20)
   Jacobian & residual evaluation    11.242205 (21)

Ubuntu 22.04 + reverted commit with CLANG
   Residual only evaluation           5.009430 (20)
   Jacobian & residual evaluation    11.754678 (21)
*********************************************************
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with gcc
   Residual only evaluation           4.914351 (20)
   Jacobian & residual evaluation    11.501043 (21)

Ubuntu 22.04 + reverted commit + march=x86-64-v3 with CLANG
   Residual only evaluation           4.931949 (20)
   Jacobian & residual evaluation    11.644230 (21)
*********************************************************
Ubuntu 22.04 + march=x86-64-v3 with gcc
   Residual only evaluation           5.520689 (20)
   Jacobian & residual evaluation    12.702707 (21)

Ubuntu 22.04 + march=x86-64-v3 with CLANG
   Residual only evaluation           5.259030 (20)
   Jacobian & residual evaluation    12.487500 (21)
esaumar commented 1 week ago

Hi @sandwichmaker, I'm closing this issue.

I was finally able to have the behavior I was expecting. Instead of using Ubuntu 22.04 I used Ubuntu 20.04 and the timing for the whole process of my project (not only the tile of 26765 points I showed) was around 5% faster than my baselines which are Ceres 1.14.0 on Ubuntu 18.04 and Ceres 2.0.0 on Ubuntu 18.04.

Btw, I think the performance differences might be related to the CXX_FLAGS used during the compilation of my project. I ran a small investigation comparing the flags in different Ubuntu versions (18.04 and 22.04) and it seems that there might be something related to the SSE flags but I didn't get any outstanding result. The investigation was taking so long that is why I decided to use Ubuntu 20.04 instead.

Anyway, below I show the timing for the different configurations. Thanks for all the support!

My optimization problem

CERES 1.14.0

Ubuntu 18.04
  Residual only evaluation           4.870745 (20)
  Jacobian & residual evaluation     6.838206 (21)

CERES 2.0.0

Ubuntu 18.04
   Residual only evaluation           5.006680 (20)
   Jacobian & residual evaluation     7.065109 (21)

Ubuntu 22.04
   Residual only evaluation           5.084708 (20)
   Jacobian & residual evaluation    12.151767 (21)

CERES 2.2.0

Ubuntu 20.04
   Residual only evaluation           4.963751 (20)
   Jacobian & residual evaluation     6.512813 (21)
sandwichmaker commented 6 days ago

Nice work. Now if we only knew the cause of this mess.

On Mon, Nov 4, 2024, 6:59 PM esaumar @.***> wrote:

Hi @sandwichmaker https://github.com/sandwichmaker, I'm closing this issue.

I was finally able to have the behavior I was expecting. Instead of using Ubuntu 22.04 I used Ubuntu 20.04 and the timing for the whole process of my project (not only the tile of 26765 points I showed) was around 5% faster than my baselines which are Ceres 1.14.0 on Ubuntu 18.04 and Ceres 2.0.0 on Ubuntu 18.04.

Btw, I think the performance differences might be related to the CXX_FLAGS used during the compilation of my project. I ran a small investigation comparing the flags in different Ubuntu versions (18.04 and 22.04) and it seems that there might be something related to the SSE flags but I didn't get any outstanding result. The investigation was taking so long that is why I decided to use Ubuntu 20.04 instead.

Anyway, below I show the timing for the different configurations. Thanks for all the support!

My optimization problem

CERES 1.14.0

Ubuntu 18.04 Residual only evaluation 4.870745 (20) Jacobian & residual evaluation 6.838206 (21)

CERES 2.0.0

Ubuntu 18.04 Residual only evaluation 5.006680 (20) Jacobian & residual evaluation 7.065109 (21)

Ubuntu 22.04 Residual only evaluation 5.084708 (20) Jacobian & residual evaluation 12.151767 (21)

CERES 2.2.0

Ubuntu 20.04 Residual only evaluation 4.963751 (20) Jacobian & residual evaluation 6.512813 (21)

— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2456125741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABLKAUHSNCHE7RV2PO3Z7AYBTAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGEZDKNZUGE . You are receiving this because you were mentioned.Message ID: @.***>