Closed gnzlbg closed 7 years ago
Multiple point :
You want to play around with segmented_output_range and/or segmented_input_range that slice an arbitrary contiguous block of memory into a scalar prologue/epilogue and a SIMD-izable block. Have a look at transform to get a gist on how ot use it.
Now, this all points to the fact we need to promote those rnage adaptors first and foremost. Also, we have a WIP to add more algorithm liek for_each and find_if which also may helps.
Thanks for the suggestions, I'll get started with those adaptors right away!
Having to partition the raw loops manually all the time seems repetitive an error prone.
I would like an
boost::simd::iterate(size_t{from}, size_t{to}, binary_fun)
function, that instantiatesbinary_fun
for the appropriate pack sizes. First I show how the nbody example can be rewritten to become faster and correct (currently it only works for particle size multiple of the pack size), then I show how iterate can be implementedThe current n-body SIMD example looks like this:
This only works if the number of particles is a multiple of the pack size. Rewriting it to use the max pack-size until the remainder, and then handle the remainder by using pack-size / 2 first, and then handling the remainder of that by then using pack-size /4.... until pack-size == 1, seems very repetitive and error prone.
I would like this example to be rewritten to something like this:
which requires being able to mix different pack-sizes in the operations. The
simd_iterate
function, would, for a max pack size of 8, instantiate the closures 4 times, for sizes of 8, 4, 2, 1, and handle the remainder of the loop for size 8, by recursively calling itself with a starting simd pack of 4: