PointCloudLibrary / pcl

Point Cloud Library (PCL)
https://pointclouds.org/
Other
10k stars 4.62k forks source link

Check which OpenMP for-loops could benefit from dynamic schedule #5785

Open mvieth opened 1 year ago

mvieth commented 1 year ago
          I guess other OMP implementations could use this setting as well, since most of those also require some search for neighbors, which can vary a lot and hence vary the computation between each iteration?

Originally posted by @larshg in https://github.com/PointCloudLibrary/pcl/issues/5775#issuecomment-1664141425

Candidates are for-loops where the loop iterations do not take the same time, e.g. when a neighbourhood search is done for each point in a cloud (especially radius search). Files that contain parallelized for-loops and should be checked:

apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_crh.hpp
apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/local_recognizer.hpp
apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_cvfh.hpp
common/src/range_image.cpp
common/src/fft/kiss_fft.c
features/include/pcl/features/impl/fpfh_omp.hpp
features/include/pcl/features/impl/shot_omp.hpp
features/include/pcl/features/impl/normal_3d_omp.hpp
features/include/pcl/features/impl/intensity_gradient.hpp
features/include/pcl/features/impl/shot_lrf_omp.hpp
features/src/range_image_border_extractor.cpp
features/src/narf.cpp
filters/include/pcl/filters/impl/convolution_3d.hpp
filters/include/pcl/filters/impl/convolution.hpp
filters/include/pcl/filters/impl/pyramid.hpp
filters/include/pcl/filters/impl/fast_bilateral_omp.hpp
filters/src/pyramid.cpp
io/include/pcl/io/impl/lzf_image_io.hpp
io/src/real_sense_2_grabber.cpp
keypoints/include/pcl/keypoints/impl/harris_6d.hpp
keypoints/include/pcl/keypoints/impl/trajkovic_3d.hpp
keypoints/include/pcl/keypoints/impl/iss_3d.hpp
keypoints/include/pcl/keypoints/impl/harris_3d.hpp
keypoints/include/pcl/keypoints/impl/harris_2d.hpp
keypoints/include/pcl/keypoints/impl/trajkovic_2d.hpp
keypoints/src/narf_keypoint.cpp
recognition/include/pcl/recognition/impl/hv/hv_go.hpp
registration/include/pcl/registration/impl/ia_fpcs.hpp
segmentation/include/pcl/segmentation/impl/approximate_progressive_morphological_filter.hpp
surface/include/pcl/surface/impl/mls.hpp
tools/fast_bilateral_filter.cpp
tools/normal_estimation.cpp
tracking/include/pcl/tracking/impl/pyramidal_klt.hpp
tracking/include/pcl/tracking/impl/kld_adaptive_particle_filter_omp.hpp
tracking/include/pcl/tracking/impl/particle_filter_omp.hpp
NemB0t commented 1 month ago

Hello @mvieth, I would like to work on this issue. Since this is my first contribution I would like some helpful pointers.

mvieth commented 1 month ago

Hello @mvieth, I would like to work on this issue. Since this is my first contribution I would like some helpful pointers.

@NemB0t Great! Please take a look at https://github.com/PointCloudLibrary/pcl/pull/5775 if you haven't already. As a start, please look through the files mentioned above and search for all parallel for loops (pragma omp for). The goal is to categorize them: One category would be "iterates over all input points/input indices and does a neighbourhood search for each one (neighbours within a radius or k-nearest)". The normal estimation from the linked pull request would fall in this category. There are probably more categories which I can't foresee right now. Based on these, we can then check where the for-loop iterations are unbalanced, that is, the iterations do not all take (roughly) the same time. For these, we would consider switching to a dynamic OpenMP schedule, like we did for the normal estimation, and then we should probably also do one or two benchmarks to verify that the dynamic schedule indeed made the algorithms faster. But we can discuss this part in detail later. I hope that helps.

NemB0t commented 1 month ago

@mvieth , I went through all the mentioned files and have classified the way parallel loops are used based on my understanding. Could you please check a few of them from each categories and let me know if corrections are needed.

Iterates over a set of indexes or group of points:

apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_crh.hpp apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/local_recognizer.hpp apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_cvfh.hpp features/include/pcl/features/impl/fpfh_omp.hpp features/include/pcl/features/impl/shot_omp.hpp features/include/pcl/features/impl/normal_3d_omp.hpp features/include/pcl/features/impl/intensity_gradient.hpp features/include/pcl/features/impl/shot_lrf_omp.hpp features/src/narf.cpp filters/include/pcl/filters/impl/convolution_3d.hpp filters/include/pcl/filters/impl/fast_bilateral_omp.hpp io/include/pcl/io/impl/lzf_image_io.hpp io/src/real_sense_2_grabber.cpp keypoints/include/pcl/keypoints/impl/harris_6d.hpp keypoints/include/pcl/keypoints/impl/trajkovic_3d.hpp keypoints/include/pcl/keypoints/impl/iss_3d.hpp keypoints/include/pcl/keypoints/impl/harris_3d.hpp keypoints/include/pcl/keypoints/impl/harris_2d.hpp keypoints/include/pcl/keypoints/impl/trajkovic_2d.hpp keypoints/src/narf_keypoint.cpp segmentation/include/pcl/segmentation/impl/approximate_progressive_morphological_filter.hppsegmentation/include/pcl/segmentation/impl/approximate_progressive_morphological_filter.hpp surface/include/pcl/surface/impl/mls.hpp tools/fast_bilateral_filter.cpp tools/normal_estimation.cpp tracking/include/pcl/tracking/impl/kld_adaptive_particle_filter_omp.hpp tracking/include/pcl/tracking/impl/particle_filter_omp.hpp

Iterates over the number of images dimentions:

common/src/range_image.cpp features/src/range_image_border_extractor.cpp filters/include/pcl/filters/impl/convolution.hpp tracking/include/pcl/tracking/impl/pyramidal_klt.hpp

Iterates p number of times (p is the value received from function call):

common/src/fft/kiss_fft.c

Iterates over the number of images dimentions and the number of kernel dimentions:

filters/include/pcl/filters/impl/pyramid.hpp filters/src/pyramid.cpp

Iterates over a fixed number of iterations:

registration/include/pcl/registration/impl/ia_fpcs.hpp

Could not find parallel for loops:

recognition/include/pcl/recognition/impl/hv/hv_go.hpp

If these are correct, could also provide some guidance on further steps?

mvieth commented 1 month ago

First of all, thank you for the list. In addition to checking what the loop iterates over, it is also imported to check what happens in each iteration. Remember, we want to find loops where the iterations do not take the same amount of time (unbalanced loops). This is for example the case when the computations in each iteration depend a lot on the data used in that iteration (e.g. on the point, when iterating over all points). I imagine that this may be difficult to judge if you don't know the classes so well. Since in the case of normal estimation we have found that the duration of a neighbourhood search can vary a lot from iteration to iteration, I would suggest to concentrate on those. This can be a search for neighbours within a radius (radiusSearch), for the k nearest neighbours (nearestKSearch), or either of these two based on settings made by the user (usually called searchForNeighbors but may have slightly different names). So it would be great if you could go through the loops in the category "Iterates over a set of indexes or group of points" again, and check whether they do a neighborhood search in each iteration, and if yes, with which method.