Closed esaumar closed 1 week ago
This is not enough information. May different things or libraries could have changed things.
At minimum please report the output of summary::fullreport() for the same problem and we can see where the time is going.
On Fri, Sep 20, 2024, 12:18 PM esaumar @.***> wrote:
Hi! I’m trying to upgrade from CERES 1.14.0 to CERES 2.2.0 (including and Ubuntu and PCL upgrade) but I’m seeing worse performance. It is increasing like 12-13% the processing time. I’m using some dependencies like PCL, OpenMVG, OpenCV. I upgraded to CERES 2.0.0 without upgrading Ubuntu and it has similar behavior as my baseline (see below). For reference, previously I created this issue #1063 https://github.com/ceres-solver/ceres-solver/issues/1063.
Baseline using CERES 1.14.0
- Ubuntu 18.04
- PCL 1.8.0
- OpenCV 4.1.1
- OpenMVG 1.2
Upgrading to CERES 2.0.0
- Ubuntu 18.04
- PCL 1.8.0
- OpenCV 4.1.1
- OpenMVG 1.2
Upgrading to CERES 2.2.0
- Ubuntu 22.04
- PCL 1.9.1
- OpenCV 4.1.1
- OpenMVG 1.2
I upgraded to Ubuntu 22.04 due to the requirement of using C++17 for CERES 2.2.0.
These are the options I’m using
ceres::Solver::Options options; options.max_num_iterations = 40; options.preconditioner_type = ceres::JACOBI; options.sparse_linear_algebra_library_type = ceres::SUITE_SPARSE; options.linear_solver_type = ceres::SPARSE_SCHUR; options.trust_region_strategy_type = ceres::LEVENBERG_MARQUARDT;
And I’m using AutoDiffCostFunction
Can you give me a hint of what could be happening in this case?
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABJUKVEKVL5C2EVBQHLZXRYJLAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGUZTSNJQGU2TENI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure @sandwichmaker. Thanks for your answer. Btw, this is a problem of optimizing a 3D point cloud obtained by an SfM engine. Here you have the output for the same problem.
Baseline using CERES 1.14.0
Refining tile with 26765 points
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 26765 26765
Parameters 26765 26765
Residual blocks 53530 53530
Residuals 2087670 2087670
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR
Threads 1 1
Linear solver ordering AUTOMATIC 7327,19438
Schur structure d,1,1 d,d,d
Cost:
Initial 5.282318e+08
Final 4.079526e+08
Change 1.202792e+08
Minimizer iterations 21
Successful steps 21
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.204571
Residual only evaluation 4.870745 (20)
Jacobian & residual evaluation 6.838206 (21)
Linear solver 1.726577 (20)
Minimizer 14.012495
Postprocessor 0.004029
Total 14.221096
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
Upgrading to CERES 2.0.0
Refining tile with 26765 points
Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-no_openmp)
Original Reduced
Parameter blocks 26765 26765
Parameters 26765 26765
Residual blocks 53530 53530
Residuals 2087670 2087670
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR
Threads 1 1
Linear solver ordering AUTOMATIC 7327,19438
Schur structure d,1,1 d,d,d
Cost:
Initial 5.282318e+08
Final 4.079526e+08
Change 1.202792e+08
Minimizer iterations 21
Successful steps 21
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.184625
Residual only evaluation 5.006680 (20)
Jacobian & residual evaluation 7.065109 (21)
Linear solver 1.721030 (20)
Minimizer 14.369807
Postprocessor 0.005544
Total 14.559977
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
Upgrading to CERES 2.2.0
Refining tile with 26605 points
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 26605 26605
Parameters 26605 26605
Residual blocks 53210 53210
Residuals 2075190 2075190
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Sparse linear algebra library SUITE_SPARSE + AMD
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR
Threads 1 1
Linear solver ordering AUTOMATIC 7296,19309
Schur structure d,1,1 d,d,d
Cost:
Initial 5.221692e+08
Final 3.998656e+08
Change 1.223036e+08
Minimizer iterations 21
Successful steps 21
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.180470
Residual only evaluation 4.888542 (20)
Jacobian & residual evaluation 10.950721 (21)
Linear solver 1.442545 (20)
Minimizer 17.889433
Postprocessor 0.004414
Total 18.074318
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
I created another case where I used CERES Solver 2.0.0 but using Ubuntu 22.04
Upgrading to CERES 2.0.0 (Ubuntu 22.04)
Refining tile with 26599 points
Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp)
Original Reduced
Parameter blocks 26599 26599
Parameters 26599 26599
Residual blocks 53198 53198
Residuals 2074722 2074722
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR
Threads 1 1
Linear solver ordering AUTOMATIC 7276,19323
Schur structure d,1,1 d,d,d
Cost:
Initial 5.243282e+08
Final 4.030508e+08
Change 1.212773e+08
Minimizer iterations 21
Successful steps 21
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.157499
Residual only evaluation 5.084708 (20)
Jacobian & residual evaluation 12.151767 (21)
Linear solver 1.567164 (20)
Minimizer 19.334076
Postprocessor 0.003865
Total 19.495441
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
P.S. The maximum number of iterations is 20, not 40.
Thanks this is informative. All the time increase seems to be in the Jacobian evaluation. Which is indeed surprising.
Would it be possible for you to also report results for Ceres 2.2.0?
Ceres 2.0.0 is multiple years old now.
Sorry you have provided what I asked for already. I scrolled too fast. Jacobian evaluation slowing down is not something I would have guessed. Let me see if I can replicate this on my end.
On Mon, Sep 23, 2024, 9:33 PM esaumar @.***> wrote:
Sure @sandwichmaker https://github.com/sandwichmaker. Thanks for your answer. Btw, this is a problem of optimizing a 3D point cloud obtained by an SfM engine. Here you have the output for the same problem.
Baseline using CERES 1.14.0
- Ubuntu 18.04
- PCL 1.8.0
- OpenCV 4.1.1
- OpenMVG 1.2
Refining tile with 26765 points Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 26765 26765 Parameters 26765 26765 Residual blocks 53530 53530 Residuals 2087670 2087670
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7327,19438 Schur structure d,1,1 d,d,d
Cost: Initial 5.282318e+08 Final 4.079526e+08 Change 1.202792e+08
Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0
Time (in seconds): Preprocessor 0.204571
Residual only evaluation 4.870745 (20) Jacobian & residual evaluation 6.838206 (21) Linear solver 1.726577 (20) Minimizer 14.012495
Postprocessor 0.004029 Total 14.221096
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
Upgrading to CERES 2.0.0
- Ubuntu 18.04
- PCL 1.8.0
- OpenCV 4.1.1
- OpenMVG 1.2
Refining tile with 26765 points
Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-no_openmp)
Original Reduced
Parameter blocks 26765 26765 Parameters 26765 26765 Residual blocks 53530 53530 Residuals 2087670 2087670
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7327,19438 Schur structure d,1,1 d,d,d
Cost: Initial 5.282318e+08 Final 4.079526e+08 Change 1.202792e+08
Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0
Time (in seconds): Preprocessor 0.184625
Residual only evaluation 5.006680 (20) Jacobian & residual evaluation 7.065109 (21) Linear solver 1.721030 (20) Minimizer 14.369807
Postprocessor 0.005544 Total 14.559977
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
Upgrading to CERES 2.2.0
- Ubuntu 22.04
- PCL 1.9.1
- OpenCV 4.1.1
- OpenMVG 1.2
Refining tile with 26605 points
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 26605 26605 Parameters 26605 26605 Residual blocks 53210 53210 Residuals 2075190 2075190
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Sparse linear algebra library SUITE_SPARSE + AMD
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7296,19309 Schur structure d,1,1 d,d,d
Cost: Initial 5.221692e+08 Final 3.998656e+08 Change 1.223036e+08
Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0
Time (in seconds): Preprocessor 0.180470
Residual only evaluation 4.888542 (20) Jacobian & residual evaluation 10.950721 (21) Linear solver 1.442545 (20) Minimizer 17.889433
Postprocessor 0.004414 Total 18.074318
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
I created another case where I used CERES Solver 2.0.0 but using Ubuntu 22.04
Upgrading to CERES 2.0.0
- Ubuntu 22.04
- PCL 1.9.1
- OpenCV 4.1.1
- OpenMVG 1.2
Refining tile with 26599 points
Solver Summary (v 2.0.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-no_openmp)
Original Reduced
Parameter blocks 26599 26599 Parameters 26599 26599 Residual blocks 53198 53198 Residuals 2074722 2074722
Minimizer TRUST_REGION
Sparse linear algebra library SUITE_SPARSE Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver SPARSE_SCHUR SPARSE_SCHUR Threads 1 1 Linear solver ordering AUTOMATIC 7276,19323 Schur structure d,1,1 d,d,d
Cost: Initial 5.243282e+08 Final 4.030508e+08 Change 1.212773e+08
Minimizer iterations 21 Successful steps 21 Unsuccessful steps 0
Time (in seconds): Preprocessor 0.157499
Residual only evaluation 5.084708 (20) Jacobian & residual evaluation 12.151767 (21) Linear solver 1.567164 (20) Minimizer 19.334076
Postprocessor 0.003865 Total 19.495441
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 20.)
P.S. The maximum number of iterations is 20, not 40.
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2370130879, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABOQPMZ5BZPVSHIKW7DZYDTQ5AVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZQGEZTAOBXHE . You are receiving this because you were mentioned.Message ID: @.***>
So here is my attempt at replicating your results on my mac. I ran the bundle adjuster just one iteration to see how long it takes to evaluate the Jacobian/residuals. I used bundle_adjuster with problem-1778-993923-pre.txt
1.14.0
bundle_adjuster --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.72e+00 2.74e+00
1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 2.97e+00 5.71e+00
Solver Summary (v 1.14.0-eigen-(3.4.0)-lapack-cxsparse-(4.4.1)-eigensparse-no_openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 1.019843
Residual only evaluation 0.207327 (1)
Jacobian & residual evaluation 2.385012 (2)
Linear solver 1.412146 (1)
Minimizer 4.711149
Postprocessor 0.051827
Total 5.782819
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
2.2.0
bundle_adjuster --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.91e+00 2.87e+00
1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 2.91e+00 5.78e+00
Solver Summary (v 2.2.0-eigen-(3.4.0)-lapack-metis-(5.1.0)-acceleratesparse-eigensparse)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.960292
Residual only evaluation 0.241840 (1)
Jacobian & residual evaluation 2.428891 (2)
Linear solver 1.380378 (1)
Minimizer 4.848096
Postprocessor 0.055693
Total 5.864081
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
HEAD
bundle_adjuster --input=/Users/sameeragarwal/Downloads/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.68e+00 2.61e+00
1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 2.84e+00 5.45e+00
Solver Summary (v 2.3.0-eigen-(3.4.0)-lapack-suitesparse-(7.8.1)-metis-(5.1.0)-acceleratesparse-eigensparse)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 0.928962
Residual only evaluation 0.213914 (1)
Jacobian & residual evaluation 2.281093 (2)
Linear solver 1.401529 (1)
Minimizer 4.548268
Postprocessor 0.050395
Total 5.527625
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
The relevant times are
1.14.0 Residual 0.207327 (1) Jacobian & residual 2.385012 (2)
2.2.0 Residual 0.241840 (1) Jacobian & residual 2.428891 (2)
HEAD Residual 0.213914 (1) Jacobian & residual 2.281093 (2)
I am not seeing much of a variation. There is some up and down but I do not see any significant changes that look like the changes you are seeing.
Which makes me wonder if the performance of AutoDiffCostFunction you are using has changed across versions of ceres.
I also replicated what you did with some of my cases and I got the same behavior than you. Actually, with multithreads 2.2.0 has better performance. Thanks for your help @sandwichmaker. I'll take a look on the functors I'm using for AutoDiffCostFunction.
Baseline using CERES 1.14.0
Single thread
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.02e+00 5.96e+00
1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 6.11e+00 1.21e+01
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 2.944793
Residual only evaluation 0.580917 (1)
Jacobian & residual evaluation 4.827219 (2)
Linear solver 2.654686 (1)
Minimizer 9.215678
Postprocessor 0.201411
Total 12.361882
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Multithread
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8 -ordering user
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.27e+00 4.32e+00
1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 3.96e+00 8.28e+00
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 8 8
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 3.043149
Residual only evaluation 0.139151 (1)
Jacobian & residual evaluation 1.342911 (2)
Linear solver 2.686629 (1)
Minimizer 5.326932
Postprocessor 0.204278
Total 8.574359
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Upgrading to CERES 2.2.0
Single thread
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.33e+00 6.20e+00
1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.25e+01
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 2.865450
Residual only evaluation 0.454072 (1)
Jacobian & residual evaluation 5.528297 (2)
Linear solver 2.609310 (1)
Minimizer 9.728248
Postprocessor 0.206709
Total 12.800407
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Multithread
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 8.62e-01 3.86e+00
1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 2.56e+00 6.42e+00
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 8 8
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 2.994667
Residual only evaluation 0.106352 (1)
Jacobian & residual evaluation 1.010054 (2)
Linear solver 1.621585 (1)
Minimizer 3.514487
Postprocessor 0.209547
Total 6.718701
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Yes multi threaded performance has improved substantially, but I noticed that single threaded residual and Jacobian performance did get worse in your experiments.
Was that a fluke or is that reliably reproducible?
Are you able to build 1.14 on the same version of Ubuntu with the same tool chain?
On Tue, Sep 24, 2024, 6:54 PM esaumar @.***> wrote:
I also replicated what you did with some of my cases and I got the same behavior than you. Actually, with multithreads 2.2.0 has better performance. Thanks for your help @sandwichmaker https://github.com/sandwichmaker. I'll take a look on the functors I'm using for AutoDiffCostFunction.
Baseline using CERES 1.14.0
- Ubuntu 18.04
- PCL 1.8.0
- OpenCV 4.1.1
- OpenMVG 1.2
Single thread
bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.02e+00 5.96e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 6.11e+00 1.21e+01
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 2.944793
Residual only evaluation 0.580917 (1) Jacobian & residual evaluation 4.827219 (2) Linear solver 2.654686 (1) Minimizer 9.215678
Postprocessor 0.201411 Total 12.361882
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Multithread
bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 1.27e+00 4.32e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 3.96e+00 8.28e+00
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.1.2)-cxsparse-(3.1.9)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 8 8 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 3.043149
Residual only evaluation 0.139151 (1) Jacobian & residual evaluation 1.342911 (2) Linear solver 2.686629 (1) Minimizer 5.326932
Postprocessor 0.204278 Total 8.574359
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Upgrading to CERES 2.2.0
- Ubuntu 22.04
- PCL 1.9.1
- OpenCV 4.1.1
- OpenMVG 1.2
Single thread
bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.33e+00 6.20e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.25e+01
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 2.865450
Residual only evaluation 0.454072 (1) Jacobian & residual evaluation 5.528297 (2) Linear solver 2.609310 (1) Minimizer 9.728248
Postprocessor 0.206709 Total 12.800407
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Multithread
bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 8 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 8.62e-01 3.86e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 2.56e+00 6.42e+00
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 8 8 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 2.994667
Residual only evaluation 0.106352 (1) Jacobian & residual evaluation 1.010054 (2) Linear solver 1.621585 (1) Minimizer 3.514487
Postprocessor 0.209547 Total 6.718701
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2372723778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABNJCRLUIAITYOQM7ILZYIJWJAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZSG4ZDGNZXHA . You are receiving this because you were mentioned.Message ID: @.***>
It is reliably reproducible. Let me share with you some data. I made 5 runs for every case and this is what I got. Anyway, I'll try to build 1.14.0 on the same version of Ubuntu and will share the data with you.
P.S. I added some side notes to each case regarding the CERES Solver version used by OpenMVG. Just in case it might be helpful. Not sure if this might create a conflict due to multiple installed versions but the linked libraries for my project in CMake are correct. The Baseline is the case where only one CERES versions was compiled.
Baseline using CERES 1.14.0
Residual only evaluation 0.603661 (1)
Jacobian & residual evaluation 4.900541 (2)
Residual only evaluation 0.596503 (1)
Jacobian & residual evaluation 4.914723 (2)
Residual only evaluation 0.600968 (1)
Jacobian & residual evaluation 4.957335 (2)
Residual only evaluation 0.598945 (1)
Jacobian & residual evaluation 4.905056 (2)
Residual only evaluation 0.598227 (1)
Jacobian & residual evaluation 4.883886 (2)
Upgrading to CERES 2.2.0
OpenMVG 1.2 Side note: OpenMVG uses its internal CERES 1.11.0
Residual only evaluation 0.459004 (1)
Jacobian & residual evaluation 5.584864 (2)
Residual only evaluation 0.462662 (1)
Jacobian & residual evaluation 5.612269 (2)
Residual only evaluation 0.456783 (1)
Jacobian & residual evaluation 5.600853 (2)
Residual only evaluation 0.457550 (1)
Jacobian & residual evaluation 5.612944 (2)
Residual only evaluation 0.469969 (1)
Jacobian & residual evaluation 5.596276 (2)
Here you have the results for the other cases that I shared in the previous messages.
Upgrading to CERES 2.0.0 (Ubuntu 18.04)
OpenMVG 1.2 Side note: OpenMVG uses another installed (external) CERES 1.14.0
Residual only evaluation 0.472687 (1)
Jacobian & residual evaluation 7.042422 (2)
Residual only evaluation 0.460220 (1)
Jacobian & residual evaluation 7.024178 (2)
Residual only evaluation 0.458277 (1)
Jacobian & residual evaluation 6.979913 (2)
Residual only evaluation 0.465458 (1)
Jacobian & residual evaluation 7.011935 (2)
Residual only evaluation 0.480360 (1)
Jacobian & residual evaluation 7.149875 (2)
Upgrading to CERES 2.0.0 (Ubuntu 22.04)
OpenMVG 1.2 Side note: OpenMVG uses another installed (external) CERES 1.14.0
Residual only evaluation 0.452606 (1)
Jacobian & residual evaluation 5.105451 (2)
Residual only evaluation 0.456624 (1)
Jacobian & residual evaluation 5.012458 (2)
Residual only evaluation 0.450904 (1)
Jacobian & residual evaluation 4.987809 (2)
Residual only evaluation 0.460692 (1)
Jacobian & residual evaluation 5.018714 (2)
Residual only evaluation 0.450405 (1)
Jacobian & residual evaluation 5.017193 (2)
@sandwichmaker, I ran the tests you suggested. I didn't installed OpenMVG and only installed the CERES versions on Ubuntu 22.04 (one independent docker image for each version). PCL and OpenCV are the same versions for both cases (even thought we are not using them for these tests). I still see that CERES 2.2.0 is taking more time on the Jacobian & Residual evaluation for single thread.
Btw, I'm using
Case 1 Baseline using CERES 1.14.0
Residual only evaluation 0.454186 (1)
Jacobian & residual evaluation 4.441791 (2)
Residual only evaluation 0.457184 (1)
Jacobian & residual evaluation 4.374026 (2)
Residual only evaluation 0.447578 (1)
Jacobian & residual evaluation 4.305764 (2)
Residual only evaluation 0.444952 (1)
Jacobian & residual evaluation 4.536934 (2)
Residual only evaluation 0.444257 (1)
Jacobian & residual evaluation 4.380742 (2)
Case 2 Upgrading to CERES 2.2.0
OpenCV 4.1.1
Residual only evaluation 0.460441 (1)
Jacobian & residual evaluation 5.557999 (2)
Residual only evaluation 0.453982 (1)
Jacobian & residual evaluation 5.531522 (2)
Residual only evaluation 0.455714 (1)
Jacobian & residual evaluation 5.585474 (2)
Residual only evaluation 0.469980 (1)
Jacobian & residual evaluation 5.582852 (2)
Residual only evaluation 0.463965 (1)
Jacobian & residual evaluation 5.595356 (2)
Okay that's an apple to apple comparison. I am assuming these numbers are from examples/bundle_adjuster?
On Wed, Sep 25, 2024, 6:21 PM esaumar @.***> wrote:
@sandwichmaker https://github.com/sandwichmaker, I ran the tests you suggested. I didn't installed OpenMVG and only installed the CERES versions on Ubuntu 22.04 (one independent docker image for each version). PCL and OpenCV are the same versions for both cases (even thought we are not using them for these tests). I still see that CERES 2.2.0 is taking more time on the Jacobian & Residual evaluation for single thread.
Btw, I'm using
- cmake 3.16.9
- eigen 3.3.4
- glog 0.4.0
- gflags 2.2.2
- SuiteSparse 5.10.1
- BLAS 3.10.0
- LAPACK 3.10.0
Case 1 Baseline using CERES 1.14.0
- Ubuntu 22.04
- PCL 1.9.1
OpenCV 4.1.1
Residual only evaluation 0.454186 (1) Jacobian & residual evaluation 4.441791 (2)
Residual only evaluation 0.457184 (1) Jacobian & residual evaluation 4.374026 (2)
Residual only evaluation 0.447578 (1) Jacobian & residual evaluation 4.305764 (2)
Residual only evaluation 0.444952 (1) Jacobian & residual evaluation 4.536934 (2)
Residual only evaluation 0.444257 (1) Jacobian & residual evaluation 4.380742 (2)
Case 2 Upgrading to CERES 2.2.0
- Ubuntu 22.04
- PCL 1.9.1
- OpenCV 4.1.1
Residual only evaluation 0.460441 (1) Jacobian & residual evaluation 5.557999 (2)
Residual only evaluation 0.453982 (1) Jacobian & residual evaluation 5.531522 (2)
Residual only evaluation 0.455714 (1) Jacobian & residual evaluation 5.585474 (2)
Residual only evaluation 0.469980 (1) Jacobian & residual evaluation 5.582852 (2)
Residual only evaluation 0.463965 (1) Jacobian & residual evaluation 5.595356 (2)
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2375550945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABOC5GV6HLBXWB7GPEDZYNOSRAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGU2TAOJUGU . You are receiving this because you were mentioned.Message ID: @.***>
Yes, I ran bundle_adjuster
like this:
Case 1 Baseline using CERES 1.14.0
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 2.79e+00 6.00e+00
1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 5.74e+00 1.17e+01
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 3.211934
Residual only evaluation 0.442265 (1)
Jacobian & residual evaluation 4.376956 (2)
Linear solver 2.660337 (1)
Minimizer 8.619866
Postprocessor 0.203518
Total 12.035319
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Case 2 Upgrading to CERES 2.2.0
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1
iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time
0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.34e+00 6.36e+00
1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.27e+01
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701
Parameters 2997771 2997771
Residual blocks 5001946 5001946
Residuals 10003892 10003892
Minimizer TRUST_REGION
Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR
Preconditioner JACOBI JACOBI
Threads 1 1
Linear solver ordering 993923,1778 993923,1778
Schur structure 2,3,9 2,3,9
Cost:
Initial 2.563973e+08
Final 1.435514e+07
Change 2.420421e+08
Minimizer iterations 2
Successful steps 2
Unsuccessful steps 0
Time (in seconds):
Preprocessor 3.021223
Residual only evaluation 0.452501 (1)
Jacobian & residual evaluation 5.535506 (2)
Linear solver 2.611305 (1)
Minimizer 9.736808
Postprocessor 0.209909
Total 12.967940
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Okay I have a Linux box, let me try and replicate the numbers on my end and then try and bisect to see what's going on. I will keep you posted on my progress.
On Wed, Sep 25, 2024, 6:49 PM esaumar @.***> wrote:
Yes, I ran bundle_adjuster like this:
Case 1 Baseline using CERES 1.14.0
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 -ordering user iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 2.79e+00 6.00e+00 1 1.435514e+07 2.42e+08 3.18e+14 7.27e+05 9.51e-01 3.00e+04 5 5.74e+00 1.17e+01
Solver Summary (v 1.14.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-cxsparse-(3.2.0)-eigensparse-openmp-no_tbb)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT
Given Used
Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 3.211934
Residual only evaluation 0.442265 (1) Jacobian & residual evaluation 4.376956 (2) Linear solver 2.660337 (1) Minimizer 8.619866
Postprocessor 0.203518 Total 12.035319
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
Case 2 Upgrading to CERES 2.2.0
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --num_iterations=1 -linear_solver iterative_schur -num_threads 1 iter cost cost_change |gradient| |step| tr_ratio tr_radius ls_iter iter_time total_time 0 2.563973e+08 0.00e+00 3.19e+15 0.00e+00 0.00e+00 1.00e+04 0 3.34e+00 6.36e+00 1 1.435514e+07 2.42e+08 3.18e+14 0.00e+00 9.51e-01 3.00e+04 5 6.30e+00 1.27e+01
Solver Summary (v 2.2.0-eigen-(3.3.4)-lapack-suitesparse-(5.10.1)-metis-(5.1.0)-eigensparse)
Original Reduced
Parameter blocks 995701 995701 Parameters 2997771 2997771 Residual blocks 5001946 5001946 Residuals 10003892 10003892
Minimizer TRUST_REGION Trust region strategy LEVENBERG_MARQUARDT Given Used Linear solver ITERATIVE_SCHUR ITERATIVE_SCHUR Preconditioner JACOBI JACOBI Threads 1 1 Linear solver ordering 993923,1778 993923,1778 Schur structure 2,3,9 2,3,9
Cost: Initial 2.563973e+08 Final 1.435514e+07 Change 2.420421e+08
Minimizer iterations 2 Successful steps 2 Unsuccessful steps 0
Time (in seconds): Preprocessor 3.021223
Residual only evaluation 0.452501 (1) Jacobian & residual evaluation 5.535506 (2) Linear solver 2.611305 (1) Minimizer 9.736808
Postprocessor 0.209909 Total 12.967940
Termination: NO_CONVERGENCE (Maximum number of iterations reached. Number of iterations: 1.)
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2375583846, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABIRNYFE7H5VNNMQ7TLZYNR2DAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZVGU4DGOBUGY . You are receiving this because you were mentioned.Message ID: @.***>
Okay, so I can't get a 25% increase like you are seeing, but I can see 10%.
On Debian Rodete with GCC 13.2, Eigen 3.4.0
Ceres Solver 1.14.0
./bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user
Residual only evaluation 0.589166 (1)
Jacobian & residual evaluation 5.313726 (2)
Residual only evaluation 0.590412 (1)
Jacobian & residual evaluation 5.345748 (2)
Ceres 2.0.0
./bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user
Residual only evaluation 0.620252 (1)
Jacobian & residual evaluation 6.008752 (2)
Residual only evaluation 0.615519 (1)
Jacobian & residual evaluation 6.033850 (2)
with Ceres Solver 2.2.0
/bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -linear_solver_ordering=user
Residual only evaluation 0.633734 (1)
Jacobian & residual evaluation 5.867421 (2)
Residual only evaluation 0.658629 (1)
Jacobian & residual evaluation 5.951328 (2)
and at HEAD today
/bin/bundle_adjuster --input=${HOME}/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -linear_solver_ordering=user
Residual only evaluation 0.576504 (1)
Jacobian & residual evaluation 5.863398 (2)
Residual only evaluation 0.581811 (1)
Jacobian & residual evaluation 5.864341 (2)
So something happened between 1.14.0 and 2.0.0 where things went bad, and then things improved a bit but never quite got back to where they were at 1.14.0.
So the offending commit seems to be 8904fa48, where we have
Residual only evaluation 0.595762 (1)
Jacobian & residual evaluation 5.947797 (2)
The commit right before this is 18a464d4, where we have
Residual only evaluation 0.603536 (1)
Jacobian & residual evaluation 4.597426 (2)
8904fa48 changes how the Jets are initialized. The commit message is a bit misleading because it seems to be talking about inlining, but if you look at whats really going on
The function Make1stOrderPerturbation which used to have a statically sized loop is runrolled using template meta-programming instead, the expectation being that the compiler should be able to inline the whole computation and optimize it.
The fact that this is making performance worse seems to indicate that is not happening. Now I am not sure if this is a GCC thing or this happens on Clang/LLVM also. Since when I run the same two commits on my mac where the default compiler is clang then I get
8904fa48
Residual only evaluation 0.213228 (1)
Jacobian & residual evaluation 1.976800 (2)
18a464d4
Residual only evaluation 0.230083 (1)
Jacobian & residual evaluation 2.024981 (2)
which does not indicate any difference. So I am starting to wonder if the worse performance is GCC specific.
I was unable to figure out how to get the clang toolchain to work on my linux box with these versions of ceres, but since we can replicate these performance problems with GCC we should see if we can fix them.
Okay I was able to use Google's internal clang based toolchain as well as my mac's clang to verify that reverting this CL has no effect if Clang/LLVM is used as the compiler. Performance remains the same. However with GCC performance does become better. autodiff_benchmarks does go up and down some even with clang.
I also tried just modifying the existing template based implementation and forcing inlining and that did not do anything, so this really is an optimization pass/some kind of inlining difference between GCC and Clang.
I think we should treat this as a GCC bug/missed optimization.
Some more debugging. Previously I was using google's internal clang based toolchain, but now I can use the clang based toolchain on debian and the results are curious. In the following HEAD means ceres solver at HEAD today and HEAD + change means that I reverted the change in https://github.com/ceres-solver/ceres-solver/commit/8904fa4887ed7b3e6d110ad5a98efbc2df48595e.
gcc HEAD
Residual only evaluation 0.581872 (1)
Jacobian & residual evaluation 5.909532 (2)
gcc HEAD + change
Residual only evaluation 0.586720 (1)
Jacobian & residual evaluation 4.791280 (2)
cmake ../ -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_FLAGS="-stdlib=libc++"
clang HEAD
Residual only evaluation 0.560918 (1)
Jacobian & residual evaluation 7.849475 (2)
clang HEAD + change
Residual only evaluation 0.571185 (1)
Jacobian & residual evaluation 7.861784 (2)
As you can see, clang while having much worse performance seems to not be affected by this code change at all. Now if I add -march=x86-64-v3
to the compiler flags then we get
gcc HEAD
Residual only evaluation 0.558229 (1)
Jacobian & residual evaluation 4.343600 (2)
gcc HEAD + change
Residual only evaluation 0.565759 (1)
Jacobian & residual evaluation 3.064159 (2)
clang HEAD
Residual only evaluation 0.563153 (1)
Jacobian & residual evaluation 3.367755 (2)
clang HEAD + change
Residual only evaluation 0.555043 (1)
Jacobian & residual evaluation 3.382892 (2)
GCC's performance gets better and is further improved by reverting this code path, and CLANG gets better but is unaffected by the code change.
I need to take a closer look at our autodiff benchmarks to see what is going on with them.
@sandwichmaker, thanks for your support and for running all those tests.
On my end, I ran another kind of apple to apple comparison. I ran the bundle_adjuster
binary with CERES 2.0.0 but using different Ubuntu version (18.04 and 22.04).
The results are also curious. For the case of the bundler_adjuster
example it took more time with Ubuntu 18.04 but for the case of my optimization problem
it took more time with Ubuntu 22.04. Probably this is also an issue related to the gcc version. I'll try to run more tests on my end.
bundler_adjuster
CERES 2.0.0 (Ubuntu 18.04)
bundle_adjuster --input=/home/Data/Projector/Experiment/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user
Residual only evaluation 0.460584 (1)
Jacobian & residual evaluation 6.957227 (2)
Residual only evaluation 0.461226 (1)
Jacobian & residual evaluation 6.968588 (2)
Residual only evaluation 0.463042 (1)
Jacobian & residual evaluation 6.960957 (2)
CERES 2.0.0 (Ubuntu 22.04)
bundle_adjuster --input=/home/Data/Experiment/problem-1778-993923-pre.txt --linear_solver=iterative_schur -num_iterations=1 -num_threads 1 -ordering=user
Residual only evaluation 0.454232 (1)
Jacobian & residual evaluation 5.003663 (2)
Residual only evaluation 0.447596 (1)
Jacobian & residual evaluation 4.996817 (2)
Residual only evaluation 0.468115 (1)
Jacobian & residual evaluation 5.180313 (2)
My optimization problem (I took the numbers from my previous message)
CERES 2.0.0 (Ubuntu 18.04)
Residual only evaluation 5.006680 (20)
Jacobian & residual evaluation 7.065109 (21)
CERES 2.0.0 (Ubuntu 22.04)
Residual only evaluation 5.084708 (20)
Jacobian & residual evaluation 12.151767 (21)
okay how about you try reverting that commit in your local version of autodiff.h and run the same tests again?
@sandwichmaker, I ran multiple tests reverting the commit and also using the compiler flag -march=x86-64-v3
. For the tests I used CERES 2.0.0 in Ubuntu 22.04. I ran the bundler_adjuster example and my optimization problem on. In your repo examples the processing time was reduced significantly in the Jacobian & residual evaluation reverting the commit and with the compiler flag. On the other hand, with my project there is nothing conclusive at the moment but it seems that reverting the commit is helping reducing a bit the processing time in the Jacobian & residual evaluation but not even close to what I got in Ubuntu 18.04.
bundler_adjuster
CERES 2.0.0
Ubuntu 18.04
Residual only evaluation 0.460584 (1)
Jacobian & residual evaluation 6.957227 (2)
Ubuntu 22.04
Residual only evaluation 0.454232 (1)
Jacobian & residual evaluation 5.003663 (2)
Ubuntu 22.04 + reverted commit
Residual only evaluation 0.444284 (1)
Jacobian & residual evaluation 3.865306 (2)
Ubuntu 22.04 + reverted commit + march=x86-64-v3
Residual only evaluation 0.440761 (1)
Jacobian & residual evaluation 2.817200 (2)
Ubuntu 22.04 + march=x86-64-v3
Residual only evaluation 0.432983 (1)
Jacobian & residual evaluation 3.192775 (2)
My optimization problem
CERES 2.0.0
Ubuntu 18.04
Residual only evaluation 5.006680 (20)
Jacobian & residual evaluation 7.065109 (21)
Ubuntu 22.04
Residual only evaluation 5.084708 (20)
Jacobian & residual evaluation 12.151767 (21)
Ubuntu 22.04 + reverted commit
Residual only evaluation 4.773392 (20)
Jacobian & residual evaluation 11.242205 (21)
Ubuntu 22.04 + reverted commit + march=x86-64-v3
Residual only evaluation 4.914351 (20)
Jacobian & residual evaluation 11.501043 (21)
Ubuntu 22.04 + march=x86-64-v3
Residual only evaluation 5.520689 (20)
Jacobian & residual evaluation 12.702707 (21)
I ran another tests compiling my project with -march=x86-64-v3
as well
Ubuntu 22.04 + reverted commit + march=x86-64-v3
Residual only evaluation 4.992464 (20)
Jacobian & residual evaluation 11.706325 (21)
Ubuntu 22.04 + march=x86-64-v3
Residual only evaluation 4.711005 (20)
Jacobian & residual evaluation 11.148952 (21)
I'll try to run some tests using CLANG as well.
So now I am quite thoroughly confused. This regressions seems to be more about changes to the compiler between Ubuntu 18.04 -> Ubuntu 22.04, rather than anything to do with ceres really no? at least for your project.
Yeah, that might be the case. Probably it is missing some compiler flag in Ubuntu 22.04 or something similar.
I already ran multiple tests with CLANG as compiler. It seems that using CLANG reduces the processing time on your examples for some cases but reverting the commit is the change that has the biggest impact in performance. In my project, it behaves similar to using gcc.
I guess I'll look for the differences in the cmake files generated when compiling with gcc in Ubuntu 18.04 and 22.04 (in both, CERES and my project). Will let you know if I find any outstanding difference. Thanks!
bundler_adjuster
CERES 2.0.0
Ubuntu 18.04 with gcc
Residual only evaluation 0.460584 (1)
Jacobian & residual evaluation 6.957227 (2)
Ubuntu 18.04 with CLANG
Residual only evaluation 0.436545 (1)
Jacobian & residual evaluation 3.901897 (2)
*********************************************************
Ubuntu 22.04 with gcc
Residual only evaluation 0.454232 (1)
Jacobian & residual evaluation 5.003663 (2)
Ubuntu 22.04 with CLANG
Residual only evaluation 0.426150 (1)
Jacobian & residual evaluation 3.923545 (2)
*********************************************************
Ubuntu 22.04 + reverted commit with gcc
Residual only evaluation 0.444284 (1)
Jacobian & residual evaluation 3.865306 (2)
Ubuntu 22.04 + reverted commit with CLANG
Residual only evaluation 0.419273 (1)
Jacobian & residual evaluation 3.833912 (2)
*********************************************************
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with gcc
Residual only evaluation 0.440761 (1)
Jacobian & residual evaluation 2.817200 (2)
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with CLANG
Residual only evaluation 0.413516 (1)
Jacobian & residual evaluation 2.722491 (2)
*********************************************************
Ubuntu 22.04 + march=x86-64-v3 with gcc
Residual only evaluation 0.432983 (1)
Jacobian & residual evaluation 3.192775 (2)
Ubuntu 22.04 + march=x86-64-v3 with CLANG
Residual only evaluation 0.408792 (1)
Jacobian & residual evaluation 2.758322 (2)
My optimization problem
CERES 2.0.0
Ubuntu 18.04 with gcc
Residual only evaluation 5.006680 (20)
Jacobian & residual evaluation 7.065109 (21)
Ubuntu 18.04 with CLANG
Residual only evaluation 5.039767 (20)
Jacobian & residual evaluation 7.082926 (21)
*********************************************************
Ubuntu 22.04 with gcc
Residual only evaluation 5.084708 (20)
Jacobian & residual evaluation 12.151767 (21)
Ubuntu 22.04 with CLANG
Residual only evaluation 5.388123 (20)
Jacobian & residual evaluation 12.389536 (21)
*********************************************************
Ubuntu 22.04 + reverted commit with gcc
Residual only evaluation 4.773392 (20)
Jacobian & residual evaluation 11.242205 (21)
Ubuntu 22.04 + reverted commit with CLANG
Residual only evaluation 5.009430 (20)
Jacobian & residual evaluation 11.754678 (21)
*********************************************************
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with gcc
Residual only evaluation 4.914351 (20)
Jacobian & residual evaluation 11.501043 (21)
Ubuntu 22.04 + reverted commit + march=x86-64-v3 with CLANG
Residual only evaluation 4.931949 (20)
Jacobian & residual evaluation 11.644230 (21)
*********************************************************
Ubuntu 22.04 + march=x86-64-v3 with gcc
Residual only evaluation 5.520689 (20)
Jacobian & residual evaluation 12.702707 (21)
Ubuntu 22.04 + march=x86-64-v3 with CLANG
Residual only evaluation 5.259030 (20)
Jacobian & residual evaluation 12.487500 (21)
Hi @sandwichmaker, I'm closing this issue.
I was finally able to have the behavior I was expecting. Instead of using Ubuntu 22.04 I used Ubuntu 20.04 and the timing for the whole process of my project (not only the tile of 26765 points I showed) was around 5% faster than my baselines which are Ceres 1.14.0 on Ubuntu 18.04 and Ceres 2.0.0 on Ubuntu 18.04.
Btw, I think the performance differences might be related to the CXX_FLAGS used during the compilation of my project. I ran a small investigation comparing the flags in different Ubuntu versions (18.04 and 22.04) and it seems that there might be something related to the SSE flags but I didn't get any outstanding result. The investigation was taking so long that is why I decided to use Ubuntu 20.04 instead.
Anyway, below I show the timing for the different configurations. Thanks for all the support!
My optimization problem
CERES 1.14.0
Ubuntu 18.04
Residual only evaluation 4.870745 (20)
Jacobian & residual evaluation 6.838206 (21)
CERES 2.0.0
Ubuntu 18.04
Residual only evaluation 5.006680 (20)
Jacobian & residual evaluation 7.065109 (21)
Ubuntu 22.04
Residual only evaluation 5.084708 (20)
Jacobian & residual evaluation 12.151767 (21)
CERES 2.2.0
Ubuntu 20.04
Residual only evaluation 4.963751 (20)
Jacobian & residual evaluation 6.512813 (21)
Nice work. Now if we only knew the cause of this mess.
On Mon, Nov 4, 2024, 6:59 PM esaumar @.***> wrote:
Hi @sandwichmaker https://github.com/sandwichmaker, I'm closing this issue.
I was finally able to have the behavior I was expecting. Instead of using Ubuntu 22.04 I used Ubuntu 20.04 and the timing for the whole process of my project (not only the tile of 26765 points I showed) was around 5% faster than my baselines which are Ceres 1.14.0 on Ubuntu 18.04 and Ceres 2.0.0 on Ubuntu 18.04.
Btw, I think the performance differences might be related to the CXX_FLAGS used during the compilation of my project. I ran a small investigation comparing the flags in different Ubuntu versions (18.04 and 22.04) and it seems that there might be something related to the SSE flags but I didn't get any outstanding result. The investigation was taking so long that is why I decided to use Ubuntu 20.04 instead.
Anyway, below I show the timing for the different configurations. Thanks for all the support!
My optimization problem
CERES 1.14.0
Ubuntu 18.04 Residual only evaluation 4.870745 (20) Jacobian & residual evaluation 6.838206 (21)
CERES 2.0.0
Ubuntu 18.04 Residual only evaluation 5.006680 (20) Jacobian & residual evaluation 7.065109 (21)
Ubuntu 22.04 Residual only evaluation 5.084708 (20) Jacobian & residual evaluation 12.151767 (21)
CERES 2.2.0
Ubuntu 20.04 Residual only evaluation 4.963751 (20) Jacobian & residual evaluation 6.512813 (21)
— Reply to this email directly, view it on GitHub https://github.com/ceres-solver/ceres-solver/issues/1102#issuecomment-2456125741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCABLKAUHSNCHE7RV2PO3Z7AYBTAVCNFSM6AAAAABOSWXSFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWGEZDKNZUGE . You are receiving this because you were mentioned.Message ID: @.***>
Hi! I’m trying to upgrade from CERES 1.14.0 to CERES 2.2.0 (including and Ubuntu and PCL upgrade) but I’m seeing worse performance. It is increasing like 12-13% the processing time. I’m using some dependencies like PCL, OpenMVG, OpenCV. I upgraded to CERES 2.0.0 without upgrading Ubuntu and it has similar behavior as my baseline (see below). For reference, previously I created this issue #1063.
Baseline using CERES 1.14.0
Upgrading to CERES 2.0.0
Upgrading to CERES 2.2.0
I upgraded to Ubuntu 22.04 due to the requirement of using C++17 for CERES 2.2.0.
These are the options I’m using
And I’m using
AutoDiffCostFunction
Can you give me a hint of what could be happening in this case?