Open oxinabox opened 5 years ago
Should be quite trivial to do indeed by dispatching on ::typeof(identity)
.
That would not be useful.
Identity was just an example.
It needs to be done by dispatching on ::typeof{vcat}
and ::typeof{hcat}
etc.
julia> const x = [rand(100) for _ in 1:1000];
julia> @btime mapreduce(z->2z, vcat, x);
169.896 ms (2979 allocations: 382.80 MiB)
julia> @btime reduce(vcat, map(z->2z, x));
124.665 μs (1004 allocations: 1.63 MiB)
julia> @btime mapreduce(z->2z, hcat, x);
185.014 ms (3980 allocations: 382.83 MiB)
julia> @btime reduce(hcat, map(z->2z, x));
125.138 μs (1004 allocations: 1.63 MiB)
Or even say
julia> const y = [rand(100, 100) for _ in 1:1000];
julia> @btime mapreduce(z->sum(z;dims=2), hcat, y);
197.977 ms (9978 allocations: 383.01 MiB)
julia> @btime reduce(hcat, map(z->sum(z; dims=2), y));
4.590 ms (7004 allocations: 1.81 MiB)
Unfortunately, I'm afraid that allowing any f
while spcializing on ::typeof(vcat)
will introduce ambiguities. I guess one just needs to try it.
Seems fine to me. is this a valid way to check it?
The following matches the signature used by reduce(::typeof{vcat}...
julia> Base.mapreduce(f, op::Union{typeof(vcat),typeof(hcat)}, A::AbstractVector{<:Union{AbstractVector, AbstractMatrix}}) = reduce(op, map(f, A))
julia> using Test
julia> Test.detect_ambiguities(Base)
0-element Array{Tuple{Method,Method},1}
julia> const y = [rand(100, 100) for _ in 1:1000];
julia> expected = reduce(hcat, map(z->sum(z; dims=2), y));
julia> @test expected == mapreduce(z->sum(z;dims=2), hcat, y);
julia> methods(mapreduce)
# 5 methods for generic function "mapreduce":
[1] mapreduce(f, op, a::Number) in Base at reduce.jl:324
[2] mapreduce(f, op, itr::Base.SkipMissing{#s623} where #s623<:AbstractArray) in Base at missing.jl:202
[3] mapreduce(f, op::Union{typeof(hcat), typeof(vcat)}, A::AbstractArray{#s3,1} where #s3<:Union{AbstractArray{T,1} where T, AbstractArray{T,2} where T}) in Main at REPL[6]:1
[4] mapreduce(f, op, A::AbstractArray; dims, kw...) in Base at reducedim.jl:304
[5] mapreduce(f, op, itr; kw...) in Base at reduce.jl:205
Looks fine. For reduce
there's a special SharedArray
method but it doesn't seem to exist for mapreduce
.
I just bumped into this. It would definitely be nice to have mapreduce(f, [hv]cat, xs)
be optimized.
It's not precisely parallel, but many uses of this would be served by #43334. If x
and f(x)
are vectors then mapreduce(f, hcat, xs) == stack(f, xs)
.
Note that we also didn't optimize reduce
when there is an init
keyword:
julia> v = [rand(1:10, rand(0:10)) for _ in 1:100];
julia> @btime reduce(vcat, $v, init=Int[]);
23.151 μs (101 allocations: 210.23 KiB)
julia> @btime reduce(vcat, $v);
962.444 ns (2 allocations: 5.12 KiB)
so clearly we didn't override a low-level enough method.
The keyword init
could presumably be handled as part of https://github.com/JuliaLang/julia/issues/27188 (i.e. the methods for ::AbstractVector{<:AbstractVecOrMat}}
) without needing this PR's extension to mapreduce
.
In this case it looks like it's being used to handle the empty case. Possibly #27188 should just allow that, when the type is known:
julia> empty(v)
Vector{Int64}[]
julia> reduce(vcat, empty(v), init=Int[])
Int64[]
julia> reduce(vcat, empty(v))
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
Stacktrace:
[1] reduce_empty(op::Base.MappingRF{typeof(eltype), typeof(promote_type)}, #unused#::Type{Vector{Int64}})
given we had https://github.com/JuliaLang/julia/issues/21672 which makes
reduce(vcat
andreduce(hcat
fast.I had assumed it also applied to
mapreduce
of these operations. But it clearly does not.Right now, if the 2nd arg is
vcat
orhcat
it is faster to callreduce(op, map(f, xs))
thanmapreduce(f, op, xs)
. And maybe that would be a solid first move to fix this.