MRtrix3 / mrtrix3

MRtrix3 provides a set of tools to perform various advanced diffusion MRI analyses, including constrained spherical deconvolution (CSD), probabilistic tractography, track-density imaging, and apparent fibre density
http://www.mrtrix.org
Mozilla Public License 2.0
291 stars 179 forks source link

Improving build times #2876

Closed daljit46 closed 4 months ago

daljit46 commented 5 months ago

MRtrix3 can take a long time to build. I often like to use a Windows laptop (i5 10th generation quad-core processor) to test changes on MSYS2 and compile times are in the 20-25-minute ballpark. The situation is much better on a Macbook M2 PRO, where I can compile the project in just under 2 minutes. Nonetheless, I think it would be worth to trying to improve the compilation time.

Possibly the most taxing factors on build times are:

To examine this more carefully, I carried out a build analysis using ClangBuildAnalyzer and here's the output:

**** Time summary:
Compilation (725 times):
  Parsing (frontend):         3927.2 s
  Codegen & opts (backend):   2658.8 s

**** Files that took longest to parse (compiler frontend):
 56624 ms: mrtrix3/build/cmd/CMakeFiles/mrregister.dir/mrregister.cpp.o
 46102 ms: mrtrix3/build/cmd/CMakeFiles/mrtransform.dir/mrtransform.cpp.o
 39887 ms: mrtrix3/build/src/CMakeFiles/mrtrix-headless.dir/registration/transform/initialiser_helpers.cpp.o
 35432 ms: mrtrix3/build/cmd/CMakeFiles/dwidenoise.dir/dwidenoise.cpp.o
 34612 ms: mrtrix3/build/src/CMakeFiles/mrtrix-headless.dir/registration/transform/rigid.cpp.o
 33769 ms: mrtrix3/build/core/CMakeFiles/mrtrix-core.dir/math/average_space.cpp.o
 33294 ms: mrtrix3/build/src/CMakeFiles/mrtrix-headless.dir/registration/transform/affine.cpp.o
 32190 ms: mrtrix3/build/cmd/CMakeFiles/mrmetric.dir/mrmetric.cpp.o
 30567 ms: mrtrix3/build/src/CMakeFiles/mrtrix-headless.dir/registration/nonlinear.cpp.o
 29932 ms: mrtrix3/build/cmd/CMakeFiles/transformcalc.dir/transformcalc.cpp.o

**** Files that took longest to codegen (compiler backend):
256668 ms: mrtrix3/build/cmd/CMakeFiles/mrregister.dir/mrregister.cpp.o
 87793 ms: mrtrix3/build/cmd/CMakeFiles/mrtransform.dir/mrtransform.cpp.o
 69324 ms: mrtrix3/build/cmd/CMakeFiles/dwidenoise.dir/dwidenoise.cpp.o
 62017 ms: mrtrix3/build/cmd/CMakeFiles/tckmap.dir/tckmap.cpp.o
 54304 ms: mrtrix3/build/cmd/CMakeFiles/mrmath.dir/mrmath.cpp.o
 45067 ms: mrtrix3/build/cmd/CMakeFiles/mrconvert.dir/mrconvert.cpp.o
 44092 ms: mrtrix3/build/core/CMakeFiles/mrtrix-core.dir/math/stats/glm.cpp.o
 41614 ms: mrtrix3/build/cmd/CMakeFiles/mrmetric.dir/mrmetric.cpp.o
 38975 ms: mrtrix3/build/cmd/CMakeFiles/tckgen.dir/tckgen.cpp.o
 35974 ms: mrtrix3/build/cmd/CMakeFiles/mrfilter.dir/mrfilter.cpp.o

**** Templates that took longest to instantiate:
194266 ms: Eigen::ReturnByValue<Eigen::MatrixSquareRootReturnValue<Eigen::Matri... (11 times, avg 17660 ms)
194265 ms: Eigen::MatrixSquareRootReturnValue<Eigen::Matrix<double, 4, 4, 0>>::... (11 times, avg 17660 ms)
194243 ms: Eigen::internal::matrix_sqrt_compute<Eigen::Matrix<double, 4, 4, 0>,... (11 times, avg 17658 ms)
177340 ms: Eigen::Matrix<double, 4, 4, 0>::Matrix<Eigen::ReturnByValue<Eigen::M... (10 times, avg 17734 ms)
160110 ms: Eigen::DenseBase<Eigen::ReturnByValue<Eigen::MatrixSquareRootReturnV... (9 times, avg 17790 ms)
160083 ms: Eigen::PlainObjectBase<Eigen::Matrix<double, 4, 4, 0>>::_init1<Eigen... (9 times, avg 17787 ms)
139277 ms: MR::Math::condition_number<Eigen::Matrix<double, -1, -1, 0>> (29 times, avg 4802 ms)
137700 ms: MR::DWI::compute_SH2amp_mapping<Eigen::Matrix<double, -1, -1, 0>> (28 times, avg 4917 ms)
136220 ms: Eigen::JacobiSVD<Eigen::Matrix<double, -1, -1, 0>, 2>::JacobiSVD (29 times, avg 4697 ms)
135787 ms: Eigen::JacobiSVD<Eigen::Matrix<double, -1, -1, 0>, 2>::compute (29 times, avg 4682 ms)
120996 ms: Eigen::internal::qr_preconditioner_impl<Eigen::Matrix<double, -1, -1... (29 times, avg 4172 ms)
119522 ms: MR::Thread::run<MR::Thread::(anonymous namespace)::__Multi<PerThread>> (770 times, avg 155 ms)
119348 ms: MR::Thread::(anonymous namespace)::__run<MR::Thread::(anonymous name... (770 times, avg 154 ms)
119247 ms: MR::Thread::(anonymous namespace)::__multi_thread<PerThread>::__mult... (770 times, avg 154 ms)
108107 ms: Eigen::matrix_sqrt_quasi_triangular<Eigen::Matrix<double, 4, 4, 0>, ... (11 times, avg 9827 ms)
104155 ms: std::async<void (PerThread::*)(), PerThread *> (770 times, avg 135 ms)
 83983 ms: Eigen::Transform<double, 3, 18, 0>::inverse (136 times, avg 617 ms)
 81855 ms: Eigen::RealSchur<Eigen::Matrix<double, 4, 4, 0>>::RealSchur<Eigen::M... (11 times, avg 7441 ms)
 81703 ms: Eigen::RealSchur<Eigen::Matrix<double, 4, 4, 0>>::compute<Eigen::Mat... (11 times, avg 7427 ms)
 71772 ms: Eigen::internal::matrix_sqrt_quasi_triangular_diagonal<Eigen::Matrix... (11 times, avg 6524 ms)
 71756 ms: Eigen::internal::matrix_sqrt_quasi_triangular_2x2_diagonal_block<Eig... (11 times, avg 6523 ms)
 65982 ms: Eigen::internal::apply_block_householder_on_the_left<Eigen::Block<Ei... (31 times, avg 2128 ms)
 65151 ms: Eigen::HouseholderSequence<Eigen::Matrix<double, -1, -1, 0>, Eigen::... (30 times, avg 2171 ms)
 64471 ms: Eigen::EigenSolver<Eigen::Matrix<double, 2, 2, 0>>::EigenSolver<Eige... (11 times, avg 5861 ms)
 64259 ms: Eigen::EigenSolver<Eigen::Matrix<double, 2, 2, 0>>::compute<Eigen::M... (11 times, avg 5841 ms)
 61093 ms: Eigen::RealSchur<Eigen::Matrix<double, 2, 2, 0>>::compute<Eigen::Mat... (11 times, avg 5553 ms)
 60962 ms: std::make_shared<std::__future_base::_Async_state_impl<std::thread::... (770 times, avg 79 ms)
 60622 ms: std::shared_ptr<std::__future_base::_Async_state_impl<std::thread::_... (770 times, avg 78 ms)
 60264 ms: std::__shared_ptr<std::__future_base::_Async_state_impl<std::thread:... (770 times, avg 78 ms)
 59482 ms: std::__shared_count<__gnu_cxx::_S_atomic>::__shared_count<std::__fut... (770 times, avg 77 ms)

**** Template sets that took longest to instantiate:
554506 ms: Eigen::MatrixBase<$> (26180 times, avg 21 ms)
492539 ms: Eigen::internal::call_assignment_no_alias<$> (7321 times, avg 67 ms)
482848 ms: Eigen::internal::Assignment<$>::run (7294 times, avg 66 ms)
379821 ms: Eigen::Matrix<$>::Matrix<$> (2624 times, avg 144 ms)
344731 ms: Eigen::DenseBase<$> (28020 times, avg 12 ms)
302601 ms: Eigen::PlainObjectBase<$>::_init1<$> (1067 times, avg 283 ms)
274615 ms: Eigen::internal::call_assignment<$> (6077 times, avg 45 ms)
245116 ms: Eigen::internal::call_dense_assignment_loop<$> (8562 times, avg 28 ms)
230122 ms: MR::(anonymous namespace)::ThreadedLoopRunOuter<$>::run<$> (748 times, avg 307 ms)
221489 ms: Eigen::PlainObjectBase<$>::_set_noalias<$> (2004 times, avg 110 ms)
221401 ms: MR::(anonymous namespace)::ThreadedLoopRunOuter<$>::run_outer<$> (770 times, avg 287 ms)
219146 ms: Eigen::ReturnByValue<$>::evalTo<$> (23 times, avg 9528 ms)
207481 ms: Eigen::MatrixSquareRootReturnValue<$>::evalTo<$> (13 times, avg 15960 ms)
207457 ms: Eigen::internal::matrix_sqrt_compute<$>::run<$> (13 times, avg 15958 ms)
188646 ms: Eigen::Block<$> (7226 times, avg 26 ms)
185177 ms: Eigen::BlockImpl<$> (7226 times, avg 25 ms)
182741 ms: Eigen::internal::BlockImpl_dense<$> (7226 times, avg 25 ms)
182054 ms: Eigen::internal::generic_product_impl<$>::evalTo<$> (921 times, avg 197 ms)
176622 ms: Eigen::MapBase<$> (7962 times, avg 22 ms)
166066 ms: Eigen::JacobiSVD<$>::JacobiSVD (65 times, avg 2554 ms)
165568 ms: Eigen::JacobiSVD<$>::compute (65 times, avg 2547 ms)
165331 ms: Eigen::DenseBase<$>::eval (134 times, avg 1233 ms)
165154 ms: Eigen::internal::generic_product_impl<$>::scaleAndAddTo<$> (622 times, avg 265 ms)
159545 ms: Eigen::RealSchur<$>::compute<$> (25 times, avg 6381 ms)
141864 ms: MR::Thread::run<$> (899 times, avg 157 ms)
141550 ms: MR::threaded_copy_with_progress_message<$> (441 times, avg 320 ms)
139277 ms: MR::Math::condition_number<$> (29 times, avg 4802 ms)
137700 ms: MR::DWI::compute_SH2amp_mapping<$> (28 times, avg 4917 ms)
136471 ms: Eigen::internal::dense_assignment_loop<$>::run (8367 times, avg 16 ms)
134707 ms: Eigen::internal::generic_product_impl_base<$>::scaleAndAddTo<$> (540 times, avg 249 ms)

**** Functions that took longest to compile:
  7321 ms: void MR::Registration::NonLinear::run<MR::Registration::Transform::R... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
  7158 ms: void MR::Registration::NonLinear::run<MR::Registration::Transform::A... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
  4074 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/tckmap.cpp)
  3375 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixel2voxel.cpp)
  3164 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
  2819 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrmath.cpp)
  1978 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixelcfestats.cpp)
  1645 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrtransform.cpp)
  1214 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixel2peaks.cpp)
  1187 ms: void MR::File::MGH::read_other<MR::File::GZ>(MR::Header&, MR::File::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/formats/mgz.cpp)
  1171 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixelcorrespondence.cpp)
  1139 ms: MR::GUI::MRView::Tool::Connectome::Connectome(MR::GUI::MRView::Tool:... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/src/gui/mrview/tool/connectome/connectome.cpp)
  1134 ms: void MR::File::MGH::read_other<std::basic_ifstream<char, std::char_t... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/formats/mgh.cpp)
  1067 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/tck2fixel.cpp)
  1049 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/connectome2tck.cpp)
  1031 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/warp2metric.cpp)
  1014 ms: MR::File::Dicom::dicom_to_mapper(MR::Header&, std::vector<std::share... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/file/dicom/mapper.cpp)
   987 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixelfilter.cpp)
   961 ms: void MR::Filter::ZClean::operator()<MR::Image<float>, MR::Image<floa... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrfilter.cpp)
   904 ms: void MR::Registration::Linear::run_masked<MR::Registration::Metric::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
   893 ms: void MR::Registration::Linear::run_masked<MR::Registration::Metric::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
   839 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrclusterstats.cpp)
   839 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/fixelreorient.cpp)
   829 ms: void MR::Registration::Linear::run_masked<MR::Registration::Metric::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
   792 ms: usage() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrcalc.cpp)
   763 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrgrid.cpp)
   758 ms: run() (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/connectomestats.cpp)
   734 ms: void MR::Registration::Linear::run_masked<MR::Registration::Metric::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
   724 ms: void MR::Registration::Linear::run_masked<MR::Registration::Metric::... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/cmd/mrregister.cpp)
   720 ms: MR::GUI::MRView::Mode::Volume::Shader::fragment_shader_source[abi:cx... (/home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/src/gui/mrview/mode/volume.cpp)

**** Function sets that took longest to compile / optimize:
 36856 ms: void MR::(anonymous namespace)::ThreadedLoopRunOuter<$>::run<$>(MR::... (252 times, avg 146 ms)
 17035 ms: MR::Thread::(anonymous namespace)::__multi_thread<$>::wait() (628 times, avg 27 ms)
 16099 ms: MR::Image<$>::Image(std::shared_ptr<$> const&, std::vector<$> const&) (252 times, avg 63 ms)
 15103 ms: void std::__introsort_loop<$>(__gnu_cxx::__normal_iterator<$>, __gnu... (551 times, avg 27 ms)
 12862 ms: MR::Image<$>::~Image() (252 times, avg 51 ms)
  9577 ms: void MR::(anonymous namespace)::ThreadedLoopRunOuter<$>::run_outer<$... (545 times, avg 17 ms)
  8250 ms: MR::Image<$>::with_direct_io(std::vector<$>) (34 times, avg 242 ms)
  7321 ms: void MR::Registration::NonLinear::run<$>(MR::Registration::Transform... (1 times, avg 7321 ms)
  7158 ms: void MR::Registration::NonLinear::run<$>(MR::Registration::Transform... (1 times, avg 7158 ms)
  7150 ms: std::future<$> std::async<$>(std::launch, MR::(anonymous namespace):... (545 times, avg 13 ms)
  6960 ms: Eigen::Matrix<$> MR::File::Matrix::load_matrix<$>(std::__cxx11::basi... (56 times, avg 124 ms)
  6776 ms: std::vector<$> MR::Stride::order<$>(MR::(anonymous namespace)::TmpIm... (252 times, avg 26 ms)
  6301 ms: MR::Thread::(anonymous namespace)::__multi_thread<$>::~__multi_threa... (628 times, avg 10 ms)
  6294 ms: std::__future_base::_Async_state_impl<$>::_M_run() (698 times, avg 9 ms)
  6165 ms: Eigen::internal::dense_assignment_loop<$>::run(Eigen::internal::gene... (212 times, avg 29 ms)
  5754 ms: std::vector<$>::~vector() (1817 times, avg 3 ms)
  3711 ms: void std::vector<$>::_M_realloc_insert<$>(__gnu_cxx::__normal_iterat... (276 times, avg 13 ms)
  3499 ms: std::vector<$> MR::Stride::order<$>(MR::Image<$> const&, unsigned lo... (140 times, avg 24 ms)
  3295 ms: Eigen::internal::general_matrix_vector_product<$>::run(long, long, E... (73 times, avg 45 ms)
  3167 ms: void Eigen::internal::make_block_householder_triangular_factor<$>(Ei... (29 times, avg 109 ms)
  3143 ms: void Eigen::internal::make_block_householder_triangular_factor<$>(Ei... (27 times, avg 116 ms)
  3125 ms: std::_Function_handler<$>::_M_invoke(std::_Any_data const&) (698 times, avg 4 ms)
  2978 ms: std::__future_base::_Async_state_impl<$>::~_Async_state_impl() (698 times, avg 4 ms)
  2905 ms: std::_Head_base<$>::_Head_base(std::_Head_base<$> const&) (351 times, avg 8 ms)
  2899 ms: void MR::threaded_copy_with_progress_message<$>(std::__cxx11::basic_... (17 times, avg 170 ms)
  2798 ms: MR::Math::GradientDescentBB<$>::init(std::ostream&) (16 times, avg 174 ms)
  2795 ms: void MR::threaded_copy<$>(MR::Image<$>&, MR::Image<$>&, unsigned lon... (29 times, avg 96 ms)
  2757 ms: std::_Head_base<$>::_Head_base<$>(MR::Image<$>&) (342 times, avg 8 ms)
  2748 ms: void Eigen::MatrixBase<$>::applyHouseholderOnTheRight<$>(Eigen::Matr... (34 times, avg 80 ms)
  2674 ms: Eigen::JacobiSVD<$>::compute(Eigen::Matrix<$> const&, unsigned int) (21 times, avg 127 ms)

**** Expensive headers:
672845 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/mrtrix.h (included 346 times, avg 1944 ms), included via:
  69x: command.h app.h cmdline_option.h 
  16x: app.h cmdline_option.h 
  15x: loop.h image_helpers.h datatype.h cmdline_option.h 
  9x: threaded_loop.h loop.h image_helpers.h datatype.h cmdline_option.h 
  5x: nifti_utils.h header.h app.h cmdline_option.h 
  4x: header.h app.h cmdline_option.h 
  ...

628735 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/exception.h (included 348 times, avg 1806 ms), included via:
  69x: command.h app.h cmdline_option.h mrtrix.h 
  15x: loop.h image_helpers.h datatype.h cmdline_option.h mrtrix.h 
  15x: app.h cmdline_option.h mrtrix.h 
  9x: threaded_loop.h loop.h image_helpers.h datatype.h cmdline_option.h mrtrix.h 
  5x: nifti_utils.h header.h app.h cmdline_option.h mrtrix.h 
  4x: header.h app.h cmdline_option.h mrtrix.h 
  ...

596310 ms: /usr/include/eigen3/Eigen/Geometry (included 358 times, avg 1665 ms), included via:
  68x: command.h app.h cmdline_option.h mrtrix.h exception.h types.h 
  15x: loop.h image_helpers.h datatype.h cmdline_option.h mrtrix.h exception.h types.h 
  15x: app.h cmdline_option.h mrtrix.h exception.h types.h 
  9x: threaded_loop.h iterator.h types.h 
  5x: nifti_utils.h header.h app.h cmdline_option.h mrtrix.h exception.h types.h 
  4x: threaded_copy.h threaded_loop.h iterator.h types.h 
  ...

548740 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/src/gui/opengl/gl.h (included 73 times, avg 7516 ms), included via:
  4x: file.h 
  3x: gui.h 
  2x: color_button.h 
  2x: dicom.h 
  2x: item.h undoentry.h 
  1x: cylinder.h 
  ...

508430 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/app.h (included 323 times, avg 1574 ms), included via:
  70x: command.h 
  18x: <direct include>
  17x: loop.h progressbar.h 
  10x: threaded_loop.h loop.h progressbar.h 
  7x: utils.h 
  6x: nifti_utils.h header.h 
  ...

243315 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/datatype.h (included 288 times, avg 844 ms), included via:
  20x: header.h 
  17x: image.h copy.h loop.h image_helpers.h 
  16x: loop.h image_helpers.h 
  10x: list.h header.h 
  9x: threaded_loop.h loop.h image_helpers.h 
  8x: file.h file_base.h properties.h roi.h image.h copy.h loop.h image_helpers.h 
  ...

212697 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/command.h (included 111 times, avg 1916 ms), included via:
  111x: <direct include>

194912 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/algo/loop.h (included 212 times, avg 919 ms), included via:
  36x: image.h copy.h 
  21x: <direct include>
  10x: threaded_loop.h 
  9x: file.h file_base.h properties.h roi.h image.h copy.h 
  7x: properties.h roi.h image.h copy.h 
  6x: threaded_copy.h threaded_loop.h 
  ...

192188 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/image.h (included 211 times, avg 910 ms), included via:
  68x: <direct include>
  9x: file.h file_base.h properties.h roi.h 
  7x: properties.h roi.h 
  6x: window.h gui_image.h 
  6x: helpers.h 
  4x: base.h base.h window.h gui_image.h 
  ...

168263 ms: /home/runner/work/mrtrix-build-analysis/mrtrix-build-analysis/mrtrix3/core/header.h (included 259 times, avg 649 ms), included via:
  35x: image.h mrtrix_utils.h 
  26x: <direct include>
  20x: gradient.h 
  11x: list.h 
  10x: nifti_utils.h 
  8x: file.h file_base.h properties.h roi.h image.h mrtrix_utils.h 
  ...

  done in 4.3s.

We can see that:

I think it would be worth discussing what can be done to improve the situation (e.g. precompile header files, splitting .cpp files, reduce templated code, explicit instantiation of templates, etc...).

Lestropie commented 5 months ago
daljit46 commented 5 months ago

We definitely don't want to be reducing the actual utilisation of templates. The computation speed afforded by their use outweighs the detriment to compilation performance.

I agree that if performance concerns justify it, then templates can be a very good choice (additionally they also provide compile time safety which is always more desirable than runtime safety). However, I would say that the majority of code in any project is not performance-sensitive. Often templates can be a huge pain not just because of build times, but also because of readability, maintainability, binary size and error messages. BTW, my concern here was more with reducing the number of template instantiations rather than the use of templates themselves (which I guess is one way of doing that).

If there's candidate files currently implemented as header-only that could be precompiled without loss of execution performance then I'm all for it. Eg. I know from work elsewhere that there's all sorts of functions in core/fixel/helpers.h defined INLINE that are not at all performance-related. Or maybe eg. interpolators could be explicitly instantiated for the full set of possible input image / adaptor types.

I'm not sure if we have enough "stable" code to justify this. However, an alternative idea I used in #2877 is precompiled headers, which is similar in spirit.

daljit46 commented 4 months ago

This can be closed now that #2877 has been merged, improving the situation quite a bit. Possible improvements may be obtained (especially for incremental builds) by cleaning up unnecessary header includes, but that's a somewhat tangential issue and something I hope to try at some point in the future.

For reference, ClangBuildAnalyzer analysis now shows significantly improved parsing (~55% lower) and codegen (~20% lower) times:

**** Time summary:
Compilation (787 times):
  Parsing (frontend):         1798.5 s
  Codegen & opts (backend):   2184.3 s