accumulator = 0
for i in 1:length(array)
accumulator ⊻= array[i]
end
result = count_ones(accumulator)
Here count_ones(a⊻b) = count_ones(a)+count_ones(b) modulo 2 which permits to vectorize as
accumulator = zero(lane)
for i in 1:lanesize:length(array)
accumulator ⊻= array[i+lane]
end
result = sum(count_ones(a) for a in accumulator)
However, all one can do automagically with @turbo is
accumulator = 0
@turbo for i in 1:length(array)
accumulator += count_ones(array[i])
end
result = accumulator
This is slower because count_ones is called each time instead of being called only once at the end.
This feature request is asking for a way to teach @turbo that it is permitted to use an isomorphism of the type operatorA(f.(array)...) == f(operatorB(array...)).
I do not know the proper name for this concept; feel free to change the title.
Based on discussion here: https://discourse.julialang.org/t/how-to-do-simd-code-with-wide-register-accumulators-simd-vs-loopvectorization-jl-vs-simd-jl/63322
Consider
Here
count_ones(a⊻b) = count_ones(a)+count_ones(b)
modulo 2 which permits to vectorize asHowever, all one can do automagically with
@turbo
isThis is slower because
count_ones
is called each time instead of being called only once at the end.This feature request is asking for a way to teach
@turbo
that it is permitted to use an isomorphism of the typeoperatorA(f.(array)...) == f(operatorB(array...))
.