Open mtrbean opened 9 years ago
There's sum
and countnz
, both of which do what you want. It looks like the generated machine code is perhaps a little more complicated than it needs to be, but I do spot a popcntq
instruction that it ends up calling.
In general, these sorts of open-ended questions may be better suited to ask in the julia-users mailing list.
I'm not sure if this is an issue specifically for these methods--if the "little more complicated" bits are just a general compiler improvement, @mbauman, we can just point this at the compiler optimization metaissue and close it.
(I think I'm using those tags correctly-ish. We haven't actually established a performance issue, yet, however.)
It uses llvm ctpop
instruction on BitArray chunks under the hood. I doubt we could do better.
For the record, when I wrote that I also tried the tricks described here: http://www.hackersdelight.org/hdcodetxt/popArrayHS.c.txt, which in principle could produce some gains on large inputs, but I couldn't measure any impact. It was some time ago though.
With respect to counting along a dimension, there surely are optimization opportunities there for some cases (e.g. counting along columns). I just did some very quick tests on random matrices, and it seems some quite large speedups (even 10x on large instances) can be easily obtained like this:
Int[sum(a[:,i]) for i = 1:size(a,2)]' # instead of sum(a,1)
(b=a';Int[sum(b[:,i]) for i =1:size(b,2)]) # instead of sum(a,2)
Both versions allocate memory though (the second one more).
The built in sums are now faster than the alternatives given by @carlobaldassi
in the last post for a 10^4 x 10^4 matrix.
@KristofferC: I see a considerable speed-up between julia 0.4 and 0.5, and then the same speed with the latest 0.6. However, for me, the alternatives are still faster on a 10^4x10^4 matrx; the alternative to sum(a,1)
is more than 3 times faster, the alternative to sum(a,2)
only about 20% faster.
(unrelated comment: you quoted my username, I did not get pinged...)
You are right, I must have messed up in the benchmarking:
julia> A = BitArray([rand(Bool) for i in 1:10^4, j in 1:10^4]);
julia> @btime sum(A; dims=1)
53.077 ms (2 allocations: 78.20 KiB)
julia> carlo1(A) = Int[sum(A[:,i]) for i = 1:size(A,2)]'
carlo1 (generic function with 1 method)
julia> @btime carlo1(A)
9.877 ms (59492 allocations: 13.95 MiB)
(another aside comment: you can use bitrand(10^4, 10^4)
instead of A = BitArray([rand(Bool) for i in 1:10^4, j in 1:10^4])
, it saves typing, time, temporary allocations and random numbers.)
The reason for this is that for sum(A[:,i])
we enter the very efficient bitcount
method while for sum(A; dims=1)
we enter the generic _mapreducedim!(f, op, R::AbstractArray, A::AbstractArray)
.
I just ended up here b/c I needed to do fast column sums on large BitArrays...any chance Carlo's trick will be making it into a release? (Or has it? I'm on 1.3.0 at the moment.)
Do we have a function to count number of set bit in a BitArray, possibly along a dimension?