Open mvieth opened 1 year ago
Hello @mvieth, I would like to work on this issue. Since this is my first contribution I would like some helpful pointers.
Hello @mvieth, I would like to work on this issue. Since this is my first contribution I would like some helpful pointers.
@NemB0t Great! Please take a look at https://github.com/PointCloudLibrary/pcl/pull/5775 if you haven't already. As a start, please look through the files mentioned above and search for all parallel for loops (pragma omp for
). The goal is to categorize them: One category would be "iterates over all input points/input indices and does a neighbourhood search for each one (neighbours within a radius or k-nearest)". The normal estimation from the linked pull request would fall in this category. There are probably more categories which I can't foresee right now.
Based on these, we can then check where the for-loop iterations are unbalanced, that is, the iterations do not all take (roughly) the same time. For these, we would consider switching to a dynamic OpenMP schedule, like we did for the normal estimation, and then we should probably also do one or two benchmarks to verify that the dynamic schedule indeed made the algorithms faster. But we can discuss this part in detail later.
I hope that helps.
@mvieth , I went through all the mentioned files and have classified the way parallel loops are used based on my understanding. Could you please check a few of them from each categories and let me know if corrections are needed.
Iterates over a set of indexes or group of points:
apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_crh.hpp apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/local_recognizer.hpp apps/3d_rec_framework/include/pcl/apps/3d_rec_framework/pipeline/impl/global_nn_recognizer_cvfh.hpp features/include/pcl/features/impl/fpfh_omp.hpp features/include/pcl/features/impl/shot_omp.hpp features/include/pcl/features/impl/normal_3d_omp.hpp features/include/pcl/features/impl/intensity_gradient.hpp features/include/pcl/features/impl/shot_lrf_omp.hpp features/src/narf.cpp filters/include/pcl/filters/impl/convolution_3d.hpp filters/include/pcl/filters/impl/fast_bilateral_omp.hpp io/include/pcl/io/impl/lzf_image_io.hpp io/src/real_sense_2_grabber.cpp keypoints/include/pcl/keypoints/impl/harris_6d.hpp keypoints/include/pcl/keypoints/impl/trajkovic_3d.hpp keypoints/include/pcl/keypoints/impl/iss_3d.hpp keypoints/include/pcl/keypoints/impl/harris_3d.hpp keypoints/include/pcl/keypoints/impl/harris_2d.hpp keypoints/include/pcl/keypoints/impl/trajkovic_2d.hpp keypoints/src/narf_keypoint.cpp segmentation/include/pcl/segmentation/impl/approximate_progressive_morphological_filter.hppsegmentation/include/pcl/segmentation/impl/approximate_progressive_morphological_filter.hpp surface/include/pcl/surface/impl/mls.hpp tools/fast_bilateral_filter.cpp tools/normal_estimation.cpp tracking/include/pcl/tracking/impl/kld_adaptive_particle_filter_omp.hpp tracking/include/pcl/tracking/impl/particle_filter_omp.hpp
Iterates over the number of images dimentions:
common/src/range_image.cpp features/src/range_image_border_extractor.cpp filters/include/pcl/filters/impl/convolution.hpp tracking/include/pcl/tracking/impl/pyramidal_klt.hpp
Iterates p number of times (p is the value received from function call):
common/src/fft/kiss_fft.c
Iterates over the number of images dimentions and the number of kernel dimentions:
filters/include/pcl/filters/impl/pyramid.hpp filters/src/pyramid.cpp
Iterates over a fixed number of iterations:
registration/include/pcl/registration/impl/ia_fpcs.hpp
Could not find parallel for loops:
recognition/include/pcl/recognition/impl/hv/hv_go.hpp
If these are correct, could also provide some guidance on further steps?
First of all, thank you for the list. In addition to checking what the loop iterates over, it is also imported to check what happens in each iteration. Remember, we want to find loops where the iterations do not take the same amount of time (unbalanced loops). This is for example the case when the computations in each iteration depend a lot on the data used in that iteration (e.g. on the point, when iterating over all points). I imagine that this may be difficult to judge if you don't know the classes so well. Since in the case of normal estimation we have found that the duration of a neighbourhood search can vary a lot from iteration to iteration, I would suggest to concentrate on those. This can be a search for neighbours within a radius (radiusSearch
), for the k nearest neighbours (nearestKSearch
), or either of these two based on settings made by the user (usually called searchForNeighbors
but may have slightly different names). So it would be great if you could go through the loops in the category "Iterates over a set of indexes or group of points" again, and check whether they do a neighborhood search in each iteration, and if yes, with which method.
Originally posted by @larshg in https://github.com/PointCloudLibrary/pcl/issues/5775#issuecomment-1664141425
Candidates are for-loops where the loop iterations do not take the same time, e.g. when a neighbourhood search is done for each point in a cloud (especially radius search). Files that contain parallelized for-loops and should be checked: