JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.77k stars 5.49k forks source link

Make `cumsum(A)` compute a summed-area table #26412

Open yurivish opened 6 years ago

yurivish commented 6 years ago

The idea has been discussed in a few places (e.g. https://github.com/JuliaLang/julia/issues/20041#issuecomment-275938676) to make cumsum(A) return a summed-area table for multidimensional arrays.

The 1D case is already a summed-area table, since that's just the cumulative sum along the one dimension.

Happily, the appropriate deprecation for cumsum — deprecating it for AbstractArrays with no dims argument — is already in place, so this is just a tracking issue for the idea.

cc: @simonbyrne

mbauman commented 6 years ago

More than just cumsumcumprod, accumulate and maybe diff should be considered at the same time, too. Any others?

simonbyrne commented 6 years ago

We should at least support it when dims is passed a tuple.

bhvieira commented 6 years ago

@yurivish Something like this?

A = [31 2;12 26;13 17]

import Base.cumsum

function cumsum(A::AbstractArray, dims::Tuple{Int})
    out = copy(A)
    for i in dims
        cumsum!(out, out, i)
    end
    return out
end

cumsum(A, (1,2))
#=
3×2 Array{Int64,2}:
 31   33
 43   71
 56  101
=#

#Expected behavior is consistent
cumsum(A, (1,2)) == cumsum(cumsum(A, 1), 2)
#true

#Compatibility with the basic definition is guaranteed
cumsum(A, 1) == cumsum(A, (1,))
#true
cumsum(A, 2) == cumsum(A, (2,))
#true

It works even for subsets of 1:ndims(A) just as well

A = cat(3, A, A, A)
cumsum(A, (1,2))
#=3×2×3 Array{Int64,3}:
[:, :, 1] =
 31   33
 43   71
 56  101

[:, :, 2] =
 31   33
 43   71
 56  101

[:, :, 3] =
 31   33
 43   71
 56  101=#

I can work this into a PR. Just need some time to get acquainted to JuliaLang PR policies, so devs can assign me this if you want.

StefanKarpinski commented 6 years ago

GitHub only allows assignment to members of an organization, but you can certainly work on this if you like. Keep in mind that one will want to compute the summed area table in an order which is cache-friendly (i.e. compute along columns first for dense arrays).

martinholters commented 6 years ago

I think that definition of cumsum is nice because

StefanKarpinski commented 6 years ago

Yes, it does seem like a nice definition. If I'm understanding correctly, it's compatible with computing a summed area table as well, assuming that the default is cumsum(A, dims=1:ndims(A)). With perfectly commutative arithmetic, the order of summation doesn't matter so we can optimize it to choose the best order for summing the dimensions, but now that I'm thinking about it based on this observation you're doing the same set of summation operations no matter what so maybe that doesn't actually matter.

ChamanAgrawal commented 6 years ago

Hi, If the order of summation doesn't matter then I think simply calculating the cumulative sum will be good, right? If nobody is working on this I would like to contribute. P.S.: I am new to Julia(motivated by the concept of this lang) so any beginners advice?

StefanKarpinski commented 6 years ago

No, I would say just try it and ask for help on the #my-first-pr channel on julialang.slack.com if you need help with anything.