`sum` of a Bool sparse array is slow

JuliaSparse / SparseArrays.jl

SparseArrays.jl is a Julia stdlib

https://sparsearrays.juliasparse.org/

Other

90 stars 50 forks source link

`sum` of a Bool sparse array is slow #237

Closed eschnett closed 2 years ago

eschnett commented 2 years ago

Evaluating the sum of a sparse array is slow. It appears to use an O(n^2) algorithm:

julia> @time sum(sparse(I(10000)))
  0.401920 seconds (6 allocations: 176.281 KiB)
10000

julia> @time sum(sparse(I(20000)))
  1.616018 seconds (8 allocations: 351.875 KiB)
20000

julia> @time sum(sparse(I(40000)))
  6.819114 seconds (8 allocations: 703.500 KiB)
40000

eschnett commented 2 years ago

That is probably a one-liner to fix: sum(x::AbstractSparseArray) = sum(nonzeros(x)).

A similar optimization would work for any: any(x::AbstractSparseArray{Bool}) = any(nonzeros(x)).

Oh, and adjoints and transposes should be handled as well.

eschnett commented 2 years ago

It seems that iszero is missing the respective implementation for adjoints and transposes.

SobhanMP commented 2 years ago

nz(x::SparseMatrixCSC) = size(x, 1) * size(x, 2) - nnz(x)
nz(x::Union{Adjoint{<:Any,<:SparseMatrixCSC},Transposie{<:Any,<:SparseMatrixCSC}}) = nz(parent(x))
sum(x::SparseArray) = sum( nonzeros(x)) + nz(x) * zero(eltype(x))
sum(x::Adjoint{SparseArray}) = sum(adjoint, nonzeros(x)) + nz(x) * zero(adjoint(eltype(x)))
sum(x::Transpose{SparseArray}) = sum(transpose, nonzeros(x)) + nz(x) * zero(transpose(eltype(x)))

from the doc for zero:

Get the additive identity element for the type of x (x can also specify the type itself).

i guess that nz(x) * zero(eltype(x)) is not needed in the first case but in the next two is iszero(transpose(zero(T))) true?

dkarrasch commented 2 years ago

is iszero(transpose(zero(T))) true?

Yes, it's true.

dkarrasch commented 2 years ago

The issue here is not a missing sum (or generically reduce) specialization, it's about the fact that the sparse matrix has Bool eltype, so sum falls back to count. So what we need is

count(A::AbstractSparseArray{Bool}) = count(nonzeros(A))

or appropriately restricted sparse array types.

dkarrasch commented 2 years ago

Ha, we even have count specializations, however, including a predicate. These just don't get called when calling count without a predicate. count along dimensions works superfast BTW.