JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

Clarify map/mapreduce's documentation with respect to memoization #55238

Open mbauman opened 1 month ago

mbauman commented 1 month ago

This is split out from https://github.com/JuliaLang/julia/pull/53945#discussion_r1571584741:

Both map and mapreduce sometimes re-use return values — most notably with sparse data structures. map's docs don't mention this, whereas mapreduce explicitly allows for it — but neither documentation is ideal. It'd be nice to more finely scope this behavior along the lines of @MasonProtter's comment:

I think it's a sensible corner case to make it clear to people that we're not normally in the buisness of memoizing reductions, but if a datastructure has its own memoization scheme then we'll reserve the right to also take advantage of that.

There are probably more functions like these — including broadcast — that'd be good to evaluate at the same time.

julia> using SparseArrays

julia> map(x->x+rand(), spzeros(3))
3-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  0.538611
  [2]  =  0.538611
  [3]  =  0.538611

julia> broadcast(x->x+rand(), spzeros(3))
3-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  0.983621
  [2]  =  0.983621
  [3]  =  0.983621
mbauman commented 1 month ago

It may also be worth reconsidering this behavior in the first place — especially in cases where we're blowing up the storage of a sparse array in any case. It's more defensible in situations where f preserves or otherwise returns the "special" value, but it looks like we only memoize on zeros-as-arguments right now:

julia> map(x->x+rand(Bool), spzeros(10000))
10000-element SparseVector{Float64, Int64} with 0 stored entries

julia> map(x->x*rand(Bool), sparse(ones(10000)))
10000-element SparseVector{Float64, Int64} with 5030 stored entries:
  [4    ]  =  1.0
  [5    ]  =  1.0
           ⋮
  [9998 ]  =  1.0
  [9999 ]  =  1.0
  [10000]  =  1.0