should reductions return dense arrays?

JuliaSparse / SuiteSparseGraphBLAS.jl

Sparse, General Linear Algebra for Graphs!

MIT License

102 stars 17 forks source link

should reductions return dense arrays? #99

Open CarloLucibello opened 1 year ago

CarloLucibello commented 1 year ago

Reductions on SparseMatrixCSC return dense arrays. Should the same happen for GBMatrix? Current behavior is

julia> using SparseArrays, SuiteSparseGraphBLAS

# dense matrices from sparse arrays' reductions for base types
julia> sum(sprand(5,5,0.1); dims=1)
1×5 Matrix{Float64}:
 0.0  0.0  0.623837  0.0  0.751621

julia> sum(sprand(5,5,0.1); dims=2)
5×1 Matrix{Float64}:
 0.0
 1.303100775231686
 0.09521454262437534
 0.34834841964204744
 1.4844961238290506

# sparse matrices from sparse arrays' reductions for this repo

julia> sum(GBMatrix(sprand(5,5,0.1)); dims=2)
5x1 GraphBLAS double matrix, bitmap by col
  1 entry, memory: 256 bytes

    (4,1)    0.9055

julia> sum(GBMatrix(sprand(5,5,0.1)); dims=1)
5x1 GraphBLAS double matrix, bitmap by col
  3 entries, memory: 256 bytes

    (2,1)    0.377595
    (3,1)    0.114248
    (5,1)    0.18736

rayegun commented 1 year ago

It's not possible. If there's a missing row or missing column (which is totally possible per the spec) it is required to be sparse. If a (subgraph) has no edges for that node then we have to return missing or some other "no-value" sentinel.

If there are no missing entries, then it will return a "dense" GBMatrix. That can then be unpacked or converted by a memcopy to a Matrix.

I'm working on a new library that should support something like this, but it's a ways off. And we still want to support the graph method.

More generally if you're asking about Matrix outputs specifically, that's a limitation right now that might be overcome in the future but I can't make any guarantees.

CarloLucibello commented 1 year ago

The semantics of GBMatrix seems not to be entirely consistent with that of an AbstractArray. GBMatrix chooses to ignore the fill value in reductions:

julia> x = GBMatrix([1,2], [2, 3], [1,2], fill=100000)
2x3 GraphBLAS int64_t matrix, bitmap by row
  2 entries, memory: 264 bytes

    (1,2)   1
    (2,3)   2

julia> x[1,1]
100000

julia> sum(x)
3

and so we have this weird thing that manually computing the sum with a for loop gives a different value compared to sum(x). This doesn't feel very safe.

rayegun commented 1 year ago

Yes, so there's a significant issue here I've yet to resolve. The main problem is that fill isn't a construct in GraphBLAS, it's my own addition to make things more AbstractArray like as much as possible.

This is solvable for reduce pretty trivially, and I can do that, just counting the number of non-fills and doing a simple multiply is fine.

But then you get to multiplication and I don't have a great answer. Do you think A * B should account for arbitrary fill value? This gets into the weeds when you use various semirings. If I do take the fill value into account then I will be arbitrarily densifying a significant fraction of the time, pretty much whenever the fill value isn't 0, NoValue/missing, or [-]Inf.

The performance impact of densifying would be immense.

rayegun commented 1 year ago

Keep in mind also, the use of this library as a SparseArray is mostly secondary to its use for graph algorithms. The fill was intended to try and bridge the gap between a SparseMatrixCSC and a GBMatrix as much as possible, but only in a way that doesn't compromise its use for graphs.

fill was never meant to be something that isn't an identity of something. So it should really be at minimum missing/novalue, 1, 0, Inf or -Inf. I suppose I could @warn if reduce is used on a collection with the wrong fill value for that monoid?

Note: novalue will replace missing in an upcoming version. Essentially the opposite of missing.