JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.44k stars 5.46k forks source link

Optimize sum(::BitArray, dim) #11873

Open mtrbean opened 9 years ago

mtrbean commented 9 years ago

Do we have a function to count number of set bit in a BitArray, possibly along a dimension?

mbauman commented 9 years ago

There's sum and countnz, both of which do what you want. It looks like the generated machine code is perhaps a little more complicated than it needs to be, but I do spot a popcntq instruction that it ends up calling.

In general, these sorts of open-ended questions may be better suited to ask in the julia-users mailing list.

pao commented 9 years ago

I'm not sure if this is an issue specifically for these methods--if the "little more complicated" bits are just a general compiler improvement, @mbauman, we can just point this at the compiler optimization metaissue and close it.

pao commented 9 years ago

(I think I'm using those tags correctly-ish. We haven't actually established a performance issue, yet, however.)

wildart commented 9 years ago

It uses llvm ctpop instruction on BitArray chunks under the hood. I doubt we could do better.

carlobaldassi commented 9 years ago

For the record, when I wrote that I also tried the tricks described here: http://www.hackersdelight.org/hdcodetxt/popArrayHS.c.txt, which in principle could produce some gains on large inputs, but I couldn't measure any impact. It was some time ago though.

With respect to counting along a dimension, there surely are optimization opportunities there for some cases (e.g. counting along columns). I just did some very quick tests on random matrices, and it seems some quite large speedups (even 10x on large instances) can be easily obtained like this:

Int[sum(a[:,i]) for i = 1:size(a,2)]'  # instead of sum(a,1)
(b=a';Int[sum(b[:,i]) for i =1:size(b,2)]) # instead of sum(a,2)

Both versions allocate memory though (the second one more).

KristofferC commented 7 years ago

The built in sums are now faster than the alternatives given by @carlobaldassi in the last post for a 10^4 x 10^4 matrix.

carlobaldassi commented 7 years ago

@KristofferC: I see a considerable speed-up between julia 0.4 and 0.5, and then the same speed with the latest 0.6. However, for me, the alternatives are still faster on a 10^4x10^4 matrx; the alternative to sum(a,1) is more than 3 times faster, the alternative to sum(a,2) only about 20% faster.

(unrelated comment: you quoted my username, I did not get pinged...)

KristofferC commented 7 years ago

You are right, I must have messed up in the benchmarking:

julia> A = BitArray([rand(Bool) for i in 1:10^4, j in 1:10^4]);

julia> @btime sum(A; dims=1)
  53.077 ms (2 allocations: 78.20 KiB)

julia> carlo1(A) = Int[sum(A[:,i]) for i = 1:size(A,2)]'
carlo1 (generic function with 1 method)

julia> @btime carlo1(A)
  9.877 ms (59492 allocations: 13.95 MiB)
carlobaldassi commented 7 years ago

(another aside comment: you can use bitrand(10^4, 10^4) instead of A = BitArray([rand(Bool) for i in 1:10^4, j in 1:10^4]), it saves typing, time, temporary allocations and random numbers.)

KristofferC commented 4 years ago

The reason for this is that for sum(A[:,i]) we enter the very efficient bitcount method while for sum(A; dims=1) we enter the generic _mapreducedim!(f, op, R::AbstractArray, A::AbstractArray).

bensprung commented 4 years ago

I just ended up here b/c I needed to do fast column sums on large BitArrays...any chance Carlo's trick will be making it into a release? (Or has it? I'm on 1.3.0 at the moment.)