Open SkyWorld117 opened 3 years ago
Yeah, I should add ifelse
to the known reductions.
Reductions have to be known to correctly initialize 0
s for separate accumulators.
To SIMD a reduction, you need to do multiple reductions in parallel.
Sometimes, that's possible because there are multiple in parallel you can do, e.g. you can do multiple j
iterations in parallel:
for j in axes(m,2)
s = -Inf
for i in axes(m,1)
s = ifelse(m[i]>=s, m[i,j], s)
end
x[j] = s
end
This also won't work at the moment, but there is no reason it shouldn't work even for unknown reductions: just mark the i
loop as non-reorderable, and SIMD the j
loop.
I'll do that in the LV rewrite.
However, given just one loop:
for i in eachindex(m)
s = ifelse(m[i]>=s, m[i], s)
end
doing multiple in parallel requires knowing what 0
is with respect to the reduction.
With ifelse
, 0 is whatever s
was initialized as.
With +
, it is 0.0
.
Someone could write
s = 3.4
for i in eachindex(m)
s += x[i]
end
and LoopVectorization will create a bunch of accumulators initialized to 0.0
, sum up x
in parallel, combine these accumulators into a single scalar, and then finally add 3.4
.
My point here is that it has to know 0.0
is "zero", because if it initialized each accumulator to 3.4, once it combines the accumulators, the answer will be wrong (i.e., it'll be too large by 3.4*(number of accumulators - 1)
).
This is why there is a "Reduction not found" error/why LoopVectorization needs to know what the reduction is.
I'll add ifelse
soonᵀᴹ.