Open ZeppLu opened 5 years ago
Oh that's interesting. Nice find, and thanks for the report. I believe the difference here is that Numpy pre-computes the derived strides, whereas we dynamically compute them — this general structure is what allows us to support all classes of indexing within view
. That said, it might be possible for us to eke out a bit more performance in specific cases like this one.
(Edit: thanks, prof :) )
(s/eek/eke/)
One more interesting result:
julia> function mysum(X)
s = zero(eltype(X))
@simd for i = eachindex(X)
@inbounds s += X[i]
end
s
end
mysum (generic function with 1 method)
julia> mysum(@view X[1:999,1:999]) ≈ sum(@view X[1:999,1:999])
true
julia> @benchmark mysum(@view X[1:999,1:999])
BenchmarkTools.Trial:
memory estimate: 64 bytes
allocs estimate: 1
--------------
minimum time: 358.658 μs (0.00% GC)
median time: 397.377 μs (0.00% GC)
mean time: 426.762 μs (0.00% GC)
maximum time: 1.681 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> function mysum_nosimd(X)
s = zero(eltype(X))
for i = eachindex(X)
@inbounds s += X[i]
end
s
end
mysum_nosimd (generic function with 1 method)
julia> @benchmark mysum_nosimd(@view X[1:999, 1:999])
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 1.531 ms (0.00% GC)
median time: 1.543 ms (0.00% GC)
mean time: 1.577 ms (0.00% GC)
maximum time: 3.486 ms (0.00% GC)
--------------
samples: 3163
evals/sample: 1
I dived into how sum()
works, found that sum()
calls mapreduce()
, then becomes something like Base._mapreduce_dim(identity, +, NamedTuple(), view(X, 1:999, 1:999), :)
. If I had a debugger I'd be happy to continue digging, though.
I was just doing the same — but with the new Rebugger.jl. I highly recommend it.
We basically have two different implementations of mapreduce
— a highly optimized one for arrays that support fast linear indexing, and a fallback one for all iterables. We could add one or two intermediate optimizations between those two book-ends:
AbstractArray
that assumes indexability. StridedArray
s. I don't think there'd be extra to gain here, but it could be worth an examination.I don't know if this is somewhat related to this issue, but I also found an accuracy issue which I think is more troublesome because it also affects DataFrames.jl
here is the MVP
using Distributions
using Random
# It doesn't happen always, here is a seed where this happens.
rng = MersenneTwister(630);
v = rand(rng, Normal(zero(Float32), one(Float32)), 1000)
sa = @view v[collect(1:end)]
# View (as SubArray) vs Vector
sum(sa) ≈ sum(v) # false
# They are different! and worse part is that the view version is less accurate! (according to Kahan compensated summation)
IndexStyle(v) isa IndexLinear # true
IndexStyle(sa) isa IndexCartesian #true
# They are dispatched to different implementations in base/reduce.jl > _mapreduce
Perhaps I should open a separate issue
Compared to python:
This issue holds for
prod()
as well.