Open nalimilan opened 3 years ago
Thank you for posting this. Actually I wanted to open something similar. I would start with understanding the correct design for optimal skipmissing
wrapper handling and then work-out corner cases. In the worst case I think that it would be acceptable to say that summation of floats does not guarantee -0.0
if only -0.0
are summed (I do not think anyone would rely on this).
In the example below, computing the sum of an
Array{Union{Float64, Missing}}
using a simple loop is faster when inserting a branch which addszero(T)
to the accumulator when an entry ismissing
rather than doing nothing. This is because the latter uses SIMD instructions, but not the former (even with@simd
). Adding-zero(T)
instead ofzero(T)
disables SIMD. Is this expected or could the compiler improve?This means that
sum(skipmissing(::Array{Union{Float64, Missing}}))
is slower than what a specialized implementation which would addzero(T)
could do (note that pairwise sum by passingismissing(x) ? zero(T) : x
has similar performance). But of course addingzero(T)
doesn't give the same result in corner cases (when the vector contains only-0.0
AFAICT; are there other cases?). If that's the only solution, we could check whether there's at least one entry which is not-0.0
, and if so switch to addingzero(T)
?