PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
9.83k stars 4.6k forks source link

Optimizing PCL compile {time, memory usage} #3414

Open kunaltyagi opened 4 years ago

kunaltyagi commented 4 years ago

Starting point for people who want to help optimize PCL compile times

Your Environment

Context

It might be possible to make PCL compiles faster and lighter by utilizing time-tracing utilities provided by clang-9.

Code to Reproduce

Warning: Please have plenty of spare disk space ready. Also, don't use -Werror

I got a 2.4 GB tar of the json files only. (Technically you should only need the 1.2 GB data.json, but some "editors" can't edit such huge files)

Time to dig into the flamegraph (using chrome://tracing or https://www.speedscope.app/)

Synopsis

Analyzing build trace from 'data.json'...
**** Time summary:
Compilation (693 times):
  Parsing (frontend):         5539.6 s
  Codegen & opts (backend):   4989.6 s

**** Files that took longest to parse (compiler frontend):
 46071 ms: /surface/CMakeFiles/pcl_surface.dir/src/mls.cpp.json
 26592 ms: /apps/CMakeFiles/pcl_stereo_ground_segmentation.dir/src/stereo_ground_segmentation.cpp.json
 21329 ms: /features/CMakeFiles/pcl_features.dir/src/integral_image_normal.cpp.json
 20816 ms: /recognition/CMakeFiles/pcl_recognition.dir/src/cg/geometric_consistency.cpp.json
 20525 ms: /apps/CMakeFiles/pcl_feature_matching.dir/src/feature_matching.cpp.json
 20410 ms: /recognition/CMakeFiles/pcl_recognition.dir/src/face_detection/rf_face_detector_trainer.cpp.json
 20230 ms: /tools/CMakeFiles/pcl_icp2d.dir/icp2d.cpp.json
 19747 ms: /apps/CMakeFiles/pcl_apps.dir/src/dominant_plane_segmentation.cpp.json
 19296 ms: /registration/CMakeFiles/pcl_registration.dir/src/lum.cpp.json
 19019 ms: /features/CMakeFiles/pcl_features.dir/src/multiscale_feature_persistence.cpp.json

**** Files that took longest to codegen (compiler backend):
326917 ms: /surface/CMakeFiles/pcl_surface.dir/src/mls.cpp.json
188798 ms: /sample_consensus/CMakeFiles/pcl_sample_consensus.dir/src/sac_model_cylinder.cpp.json
185092 ms: /sample_consensus/CMakeFiles/pcl_sample_consensus.dir/src/sac_model_cone.cpp.json
150004 ms: /recognition/CMakeFiles/pcl_recognition.dir/src/cg/geometric_consistency.cpp.json
134510 ms: /features/CMakeFiles/pcl_features.dir/src/multiscale_feature_persistence.cpp.json
133519 ms: /features/CMakeFiles/pcl_features.dir/src/integral_image_normal.cpp.json
123903 ms: /segmentation/CMakeFiles/pcl_segmentation.dir/src/organized_multi_plane_segmentation.cpp.json
 92971 ms: /features/CMakeFiles/pcl_features.dir/src/shot.cpp.json
 81983 ms: /features/CMakeFiles/pcl_features.dir/src/board.cpp.json
 79440 ms: /segmentation/CMakeFiles/pcl_segmentation.dir/src/sac_segmentation.cpp.json

**** Templates that took longest to instantiate:
190434 ms: pcl::transformBetween2CoordinateSystems<double> (345 times, avg 551 ms)
172737 ms: pcl::transformPlane<double> (690 times, avg 250 ms)
160057 ms: pcl::transformBetween2CoordinateSystems<float> (345 times, avg 463 ms)
157329 ms: Eigen::Hyperplane<double, 3, 0>::transform<0> (345 times, avg 456 ms)
156963 ms: pcl::transformPlane<float> (690 times, avg 227 ms)
144275 ms: Eigen::Hyperplane<float, 3, 0>::transform<0> (345 times, avg 418 ms)
142325 ms: Eigen::Hyperplane<double, 3, 0>::transform<Eigen::Block<const Eigen:... (345 times, avg 412 ms)
129967 ms: Eigen::Hyperplane<float, 3, 0>::transform<Eigen::Block<const Eigen::... (345 times, avg 376 ms)
 91442 ms: Eigen::Block<Eigen::Matrix<double, 4, 1, 0, 4, 1>, 3, 1, false>::ope... (345 times, avg 265 ms)
 91338 ms: Eigen::BlockImpl<Eigen::Matrix<double, 4, 1, 0, 4, 1>, 3, 1, false, ... (345 times, avg 264 ms)
 91282 ms: Eigen::internal::BlockImpl_dense<Eigen::Matrix<double, 4, 1, 0, 4, 1... (345 times, avg 264 ms)
 91231 ms: Eigen::MatrixBase<Eigen::Block<Eigen::Matrix<double, 4, 1, 0, 4, 1>,... (345 times, avg 264 ms)
 91185 ms: Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<double, ... (345 times, avg 264 ms)
 91056 ms: Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<double, ... (345 times, avg 263 ms)
 89806 ms: pcl::transformPoint<float> (345 times, avg 260 ms)
 89228 ms: Eigen::Matrix<double, 3, 1, 0, 3, 1>::Matrix<Eigen::Product<Eigen::T... (345 times, avg 258 ms)
 88810 ms: pcl::transformPoint<double> (345 times, avg 257 ms)
 88572 ms: Eigen::PlainObjectBase<Eigen::Matrix<double, 3, 1, 0, 3, 1> >::_init... (345 times, avg 256 ms)
 88554 ms: Eigen::PlainObjectBase<Eigen::Matrix<double, 3, 1, 0, 3, 1> >::_set_... (345 times, avg 256 ms)
 88472 ms: Eigen::internal::call_assignment_no_alias<Eigen::Matrix<double, 3, 1... (345 times, avg 256 ms)
 88199 ms: Eigen::internal::Assignment<Eigen::Matrix<double, 3, 1, 0, 3, 1>, Ei... (345 times, avg 255 ms)
 87966 ms: Eigen::internal::generic_product_impl<Eigen::Transpose<const Eigen::... (345 times, avg 254 ms)
 83762 ms: Eigen::Block<Eigen::Matrix<float, 4, 1, 0, 4, 1>, 3, 1, false>::oper... (345 times, avg 242 ms)
 83679 ms: Eigen::BlockImpl<Eigen::Matrix<float, 4, 1, 0, 4, 1>, 3, 1, false, E... (345 times, avg 242 ms)
 83634 ms: Eigen::internal::BlockImpl_dense<Eigen::Matrix<float, 4, 1, 0, 4, 1>... (345 times, avg 242 ms)
 83542 ms: Eigen::MatrixBase<Eigen::Block<Eigen::Matrix<float, 4, 1, 0, 4, 1>, ... (345 times, avg 242 ms)
 83479 ms: Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<float, 4... (345 times, avg 241 ms)
 83395 ms: Eigen::internal::call_assignment_no_alias<Eigen::Matrix<double, 3, 1... (345 times, avg 241 ms)
 83346 ms: Eigen::internal::call_assignment<Eigen::Block<Eigen::Matrix<float, 4... (345 times, avg 241 ms)
 83045 ms: Eigen::internal::Assignment<Eigen::Matrix<double, 3, 1, 0, 3, 1>, Ei... (345 times, avg 240 ms)

**** Functions that took longest to compile:
  2005 ms: void pcl::keypoints::agast::OastDetector9_16_detect<unsigned char, i... (../keypoints/src/agast_2d.cpp)
  1850 ms: void pcl::keypoints::agast::OastDetector9_16_detect<float, float>(fl... (../keypoints/src/agast_2d.cpp)
  1794 ms: pcl::GreedyProjectionTriangulation<pcl::PointNormal>::reconstructPol... (../surface/src/gp3.cpp)
  1631 ms: pcl::GreedyProjectionTriangulation<pcl::PointXYZINormal>::reconstruc... (../surface/src/gp3.cpp)
  1563 ms: pcl::GreedyProjectionTriangulation<pcl::PointXYZRGBNormal>::reconstr... (../surface/src/gp3.cpp)
  1467 ms: main (../simulation/tools/sim_viewer.cpp)
  1194 ms: ply_to_ply_converter::convert(std::__cxx11::basic_string<char, std::... (../io/tools/ply/ply2ply.cpp)
  1106 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
  1069 ms: main (../tools/pcd_viewer.cpp)
  1015 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
  1010 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_normal_sphere.cpp)
  1001 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_sphere.cpp)
   972 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   963 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_sphere.cpp)
   962 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   958 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_circle.cpp)
   941 ms: pcl::PLYReader::parse(std::__cxx11::basic_string<char, std::char_tra... (../io/src/ply_io.cpp)
   932 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   929 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   928 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   925 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cone.cpp)
   925 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   920 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cone.cpp)
   919 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   916 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cone.cpp)
   916 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   915 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_sphere.cpp)
   906 ms: Eigen::LevenbergMarquardt<Eigen::NumericalDiff<pcl::SampleConsensusM... (../sample_consensus/src/sac_model_cylinder.cpp)
   905 ms: pcl::ConcaveHull<pcl::PointNormal>::performReconstruction(pcl::Point... (../surface/src/concave_hull.cpp)
   904 ms: main (../examples/segmentation/example_cpc_segmentation.cpp)

*** Expensive headers:
1241275 ms: ../common/include/pcl/point_types.h (included 492 times, avg 2522 ms), included via:
  crf_normal_segmentation.cpp.json  (4479 ms)
  usc.cpp.json usc.hpp usc.h  (4233 ms)
  grid_projection.cpp.json  (4149 ms)
  geometric_consistency.cpp.json  (4109 ms)
  region_growing_rgb.cpp.json  (4103 ms)
  hv_papazov.cpp.json  (4095 ms)
  ...

726696 ms: ../common/include/pcl/common/io.h (included 393 times, avg 1849 ms), included via:
  pcd_grabber.cpp.json pcd_grabber.h  (7432 ms)
  convert_pcd_ascii_binary.cpp.json  (6863 ms)
  obj_io.cpp.json obj_io.h file_io.h  (6172 ms)
  local_maximum.cpp.json local_maximum.hpp  (6067 ms)
  converter.cpp.json auto_io.h  (6005 ms)
  voxel_grid.cpp.json  (5932 ms)
  ...

660795 ms: ../io/include/pcl/io/pcd_io.h (included 168 times, avg 3933 ms), included via:
  project_model.cpp.json project_model.h  (8771 ms)
  vtk2pcd.cpp.json  (8142 ms)
  feature_matching.cpp.json  (8067 ms)
  moc_project_model.cpp.json project_model.h  (8052 ms)
  item_inspector.cpp.json item_inspector.h project_model.h  (7867 ms)
  example_principal_curvatures_estimation.cpp.json  (7812 ms)
  ...

524027 ms: ../io/include/pcl/io/file_io.h (included 173 times, avg 3029 ms), included via:
  ascii_io.cpp.json ascii_io.h  (7062 ms)
  obj_io.cpp.json obj_io.h  (6179 ms)
  file_io.cpp.json  (6045 ms)
  vtk2pcd.cpp.json pcd_io.h  (5620 ms)
  example_principal_curvatures_estimation.cpp.json pcd_io.h  (5458 ms)
  lum.cpp.json pcd_io.h  (5420 ms)
  ...

386686 ms: ../common/include/pcl/pcl_macros.h (included 548 times, avg 705 ms), included via:
  cJSON.cpp.json cJSON.h  (1555 ms)
  point_xy_32f.cpp.json point_xy_32f.h common.h pcl_base.h  (1481 ms)
  statistical_multiscale_interest_region_extraction.cpp.json statistical_multiscale_interest_region_extraction.hpp statistical_multiscale_interest_region_extraction.h pcl_base.h  (1453 ms)
  passthrough.cpp.json passthrough.hpp passthrough.h filter_indices.h filter.h pcl_base.h  (1452 ms)
  grid_min.cpp.json point_types.h  (1448 ms)
  vtk_mesh_smoothing_windowed_sinc.cpp.json vtk_mesh_smoothing_windowed_sinc.h processing.h pcl_base.h  (1438 ms)
  ...

251779 ms: ../apps/cloud_composer/include/pcl/apps/cloud_composer/items/cloud_item.h (included 45 times, avg 5595 ms), included via:
  normal_estimation.cpp.json normal_estimation.h abstract_tool.h commands.h  (9548 ms)
  statistical_outlier_removal.cpp.json statistical_outlier_removal.h abstract_tool.h commands.h  (9168 ms)
  cloud_item.cpp.json  (9068 ms)
  euclidean_clustering.cpp.json euclidean_clustering.h abstract_tool.h commands.h  (8965 ms)
  moc_abstract_tool.cpp.json abstract_tool.h commands.h  (8796 ms)
  sanitize_cloud.cpp.json sanitize_cloud.h abstract_tool.h commands.h  (8710 ms)
  ...

241256 ms: /usr/include/qt/QtCore/QDebug (included 49 times, avg 4923 ms), included via:
  normal_estimation.cpp.json normal_estimation.h abstract_tool.h commands.h cloud_item.h  (9419 ms)
  statistical_outlier_removal.cpp.json statistical_outlier_removal.h abstract_tool.h commands.h cloud_item.h  (9098 ms)
  cloud_item.cpp.json cloud_item.h  (8950 ms)
  euclidean_clustering.cpp.json euclidean_clustering.h abstract_tool.h commands.h cloud_item.h  (8900 ms)
  moc_abstract_tool.cpp.json abstract_tool.h commands.h cloud_item.h  (8717 ms)
  sanitize_cloud.cpp.json sanitize_cloud.h abstract_tool.h commands.h cloud_item.h  (8601 ms)
  ...

230687 ms: ../visualization/include/pcl/visualization/pcl_visualizer.h (included 96 times, avg 2402 ms), included via:
  cloud_viewer.cpp.json cloud_viewer.h  (8344 ms)
  mesh_sampling.cpp.json  (8216 ms)
  moc_cloud_viewer.cpp.json cloud_viewer.h  (7873 ms)
  mesh2pcd.cpp.json  (7302 ms)
  moc_cloud_view.cpp.json cloud_view.h  (7132 ms)
  cloud_view.cpp.json cloud_view.h  (6878 ms)
  ...

227121 ms: ../apps/cloud_composer/include/pcl/apps/cloud_composer/commands.h (included 40 times, avg 5678 ms), included via:
  normal_estimation.cpp.json normal_estimation.h abstract_tool.h  (9569 ms)
  statistical_outlier_removal.cpp.json statistical_outlier_removal.h abstract_tool.h  (9175 ms)
  euclidean_clustering.cpp.json euclidean_clustering.h abstract_tool.h  (8973 ms)
  moc_abstract_tool.cpp.json abstract_tool.h  (8804 ms)
  sanitize_cloud.cpp.json sanitize_cloud.h abstract_tool.h  (8719 ms)
  voxel_grid_downsample.cpp.json voxel_grid_downsample.h abstract_tool.h  (8562 ms)
  ...

196121 ms: ../common/include/pcl/point_cloud.h (included 497 times, avg 394 ms), included via:
  point_cloud_handlers.cpp.json point_cloud_handlers.h point_cloud_geometry_handlers.h  (2010 ms)
  obj_rec_ransac_orr_octree_zprojection.cpp.json  (1952 ms)
  main.cpp.json cloud_composer.h  (1903 ms)
  kdtree.cpp.json kdtree.hpp kdtree.h search.h  (1876 ms)
  range_image_visualizer.cpp.json range_image_visualizer.h range_image.h  (1844 ms)
  stereo_grabber.cpp.json  (1815 ms)
  ...
stale[bot] commented 4 years ago

Marking this as stale due to 30 days of inactivity. It will be closed in 7 days if no further activity occurs.

StefanBruens commented 1 year ago

The heaviest TU obviously is the MovingLeastSquares, implemented in surface/impl/mls.hpp.

The cost is caused by instantiating the template for the cartesian product of point types, which is 15² = 225 instantiations (or at least 6^2 = 36 for the CORE_XYZ types).

The largest method therein is computeMLSPointNormal. But that actually does not have to be templated on the output type, as it only outputs (a cloud of) XYZ coordinates. Converting the points to the output type can happen immediately after, while appending them to the output PointCloud.

mvieth commented 1 year ago

@StefanBruens Thanks for investigating! Are you interested to open a pull request with a solution? Then we can discuss this further. It would be important to verify with a measurement that this modification does not make the method/function significantly slower (run time), and that the compile time is actually reduced.

Another thing we could check is whether really all ~255~ 225 instantiation combinations are needed. For example, MovingLeastSquares<pcl::PointWithRange, pcl::PointDEM> is probably used so rarely that instantiating it every time the library is built is not justified.

jcar87 commented 1 year ago

I'm one of the Conan Center maintainers. We've had requests to add and build binaries for PCL in Conan Center, and we are having issues due to compiler crashes that appear to affect the compilation of src/surface/src/mls.cpp, which I would suspect is where mls.hpp - PCL would require more resources than the vast majority of other things we build and package.

We'll keep an eye on this issue as well :)

StefanBruens commented 1 year ago

@StefanBruens Thanks for investigating! Are you interested to open a pull request with a solution? Then we can discuss this further. It would be important to verify with a measurement that this modification does not make the method/function significantly slower (run time), and that the compile time is actually reduced.

Another thing we could check is whether really all 255 instantiation combinations are needed. For example, MovingLeastSquares<pcl::PointWithRange, pcl::PointDEM> is probably used so rarely that instantiating it every time the library is built is not justified.

I had started refactoring the code, but unfortunately I haven't completed it until now, as there are quite some cross-function dependencies. Not sure when I have time to do some more work on this, sorry.

mvieth commented 1 year ago

@StefanBruens @jcar87 I created a PR to drastically reduce the number of template instantiations in MLS (by removing instantiations which are likely used by nobody). Feel free to take a look. In the future, further refactoring MLS is still possible if needed (to reduce the amount of resources needed per template instantiation).