manually provide reduction isomorphisms so that expensive operations can be factored out of the loop

I do not know the proper name for this concept; feel free to change the title.

Based on discussion here: https://discourse.julialang.org/t/how-to-do-simd-code-with-wide-register-accumulators-simd-vs-loopvectorization-jl-vs-simd-jl/63322

Consider

accumulator = 0
for i in 1:length(array)
    accumulator ⊻= array[i]
end
result = count_ones(accumulator)

Here count_ones(a⊻b) = count_ones(a)+count_ones(b) modulo 2 which permits to vectorize as

accumulator = zero(lane)
for i in 1:lanesize:length(array)
    accumulator ⊻= array[i+lane]
end
result = sum(count_ones(a) for a in accumulator)

However, all one can do automagically with @turbo is

accumulator = 0
@turbo for i in 1:length(array)
    accumulator += count_ones(array[i])
end
result = accumulator

This is slower because count_ones is called each time instead of being called only once at the end.

This feature request is asking for a way to teach @turbo that it is permitted to use an isomorphism of the type operatorA(f.(array)...) == f(operatorB(array...)).

JuliaSIMD / LoopVectorization.jl

manually provide reduction isomorphisms so that expensive operations can be factored out of the loop #291