Add vectorization to the par_vec (aka par_unseq) implementations of the parallel algorithms

brycelelbach commented 8 years ago

The par_vec (aka par_unseq) policy allows interleaving of element access functions, e.g. it is safe to the iterations of the algorithm.

Explicit engagement of compiler vectorizers through pragmas is probably the best way to ensure this occurs (e.g. #pragma simd, #pragma omp simd).

I will probably take a look into doing this myself while preparing my CppCon talk on parallel algorithms.

diehlpk commented 7 years ago

@brycelelbach @hkaiser Could you please add a project description here https://github.com/STEllAR-GROUP/hpx/wiki/GSoC-2017-Project-Ideas

Johan511 commented 1 year ago

I am interested in working on this project. I have seen that in the previous PRs we have added openMP pragmas for vectorization and parallelisation of a loop. Can someone guide me on how I can start out with working on this issue?

hkaiser commented 1 year ago

I am interested in working on this project. I have seen that in the previous PRs we have added openMP pragmas for vectorization and parallelisation of a loop. Can someone guide me on how I can start out with working on this issue?

Yes, we have implemented this for the first batch of algorithms. There are still algorithms left that have not been touched, though. Also, we would need a thorough performance analysis of the existing implementation, combined with improvements, if needed.

Johan511 commented 1 year ago

par_unseq implementation for algorithms, checking for all (work under progress)

[x] adjacent_difference
[ ] inner_product does it support any execution policy, could not find doc. Do we Implement using transform reduce?
[x] adjacent_find
[x] all_of any_of none_of
[x] copy copy_if copy_n (copy uses memmove, copy_if has unseq)
[x] move (uses memmove)
[x] count count_if
[x] equal mismatch (unable to trace bp in loop.hpp), likely does not support par_unseq
[x] exclusive_scan inclusive_scan
[x] reduce transform
[x] fill fill_n
[x] find find_end find_first_of find_if find_if_not (yet to check)
[x] for_each for_each_n
[x] generate generate_n
[x] is_heap is_heap_until (falls back to seq or par)
[x] is_partitioned is_sorted is_sorted_until
[x] lexicographical_compare
[x] max_element min_element minmax_element
[ ] make_heap
[ ] partial_sort (implemented using async)
[ ] partial_sort_copy nth_element (implemented using async futures)
[x] sort (parallel async implementation)
[ ] stable_sort
[ ] partition
[ ] partition_copy
[ ] stable_partition
[ ] remove remove_if (conditional in loop body)
[ ] remove_copy remove_copy_if (conditional in loop body)
[ ] replace replace_copy replace_copy_if replace_if (conditional in loop body)
[x] reverse reverse_copy
[x] rotate rotate_copy
[ ] search search_n (conditional is loop body, can not vectorize)
[ ] set_difference set_intersection set_symmetric_difference set_union includes
[ ] inplace_merge
[ ] merge
[ ] swap_ranges
[x] uninitialized_copy uninitialized_copy_n
[x] uninitialized_fill uninitialized_fill_n
[x] uninitialized_default_construct uninitialized_default_construct_n
[x] uninitialized_value_construct uninitialized_value_construct_n
[x] uninitialized_move uninitialized_move_n
[ ] destroy destroy_n
[ ] unique
[ ] unique_copy
[ ] transform_reduce
[ ] transform_exclusive_scan transform_inclusive_scan
[ ] shift_left shift_right
[ ] starts_with ends_with

trkk28097402 commented 8 months ago

Hello @hkaiser , I am interest in this topic on gsoc24 ,I have a qeustion. Is this restricted to only use the #pragma omp simd to vectorize or using something like m128d, m256d, some SIMD instructions are unreadable.

hkaiser commented 8 months ago

Hello @hkaiser , I am interest in this topic on gsoc24 ,I have a qeustion. Is this restricted to only use the #pragma omp simd to vectorize or using something like m128d, m256d, some SIMD instructions are unreadable.

Everything is possible, I guess - as long as it is portable across architectures (beyond x86), at least in the long run.

STEllAR-GROUP / hpx

Add vectorization to the par_vec (aka par_unseq) implementations of the parallel algorithms #2271

[ ] `partial_sort_copy` `nth_element` (implemented using async futures)

STEllAR-GROUP / hpx

Add vectorization to the par_vec (aka par_unseq) implementations of the parallel algorithms #2271

[ ] partial_sort_copy nth_element (implemented using async futures)

[ ] `partial_sort_copy` `nth_element` (implemented using async futures)