Closed amitmurthy closed 7 years ago
What version of Julia are you using? on 0.5.0 this works
master. map
was recently implemented for sparse arrays.
The new map[!]
/broadcast[!]
methods for SparseMatrixCSC
s check whether the map
/broadcast
operation yields zero when all arguments are zero. Specifically, for e.g. map(f, A, B)
, the check is
fofzeros = f(zero(eltype(A)), zero(eltype(B)))
fpreszeros = fofzeros == zero(fofzeros)
This check requires that (1) the input array eltypes provide zero
and (2) the return type of f
(evaluated for arguments of the input array eltypes) provides zero
.
The first requirement is fundamental: A sparse matrix C
is only well defined if zero(eltype(C))
is well defined, as otherwise C
's unstored entries are not well defined (in which case map
ing over C
makes little sense).
The second requirement is relaxable: We need only the result of the hypothetical iszero(fofzeros)
. Not having iszero
, we must use either fofzeros == zero(fofzeros)
or fofzeros == 0
(alternatives?). The former handles units gracefully, but fails where zero(fofzeros)
is not defined. The latter does not handle units gracefully, but works in some other cases where zero(fofzeros)
is not defined.
Both examples above violate the second requirement and would work given iszero
(without tradeoff) or with fofzeros == 0
in place of fofzeros == zero(fofzeros)
(trading off handling units).
Thoughts? Best!
alternatively wouldn't defining zero
for type any
work? I'm not sure what that would be though.
If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.
We should just add iszero
.
If you only ever do operations on the stored entries (or their locations), then it doesn't always matter whether the unstored entries have a realizable value in the same element type as the stored entries. I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.
Absolutely, SparseMatrixCSC
s with only stored entries well defined have utility. Applying whole-array operations (e.g. map
) to such objects makes little sense though; the hypothetical mapstored
seems the appropriate operation in that case.
We should just add iszero.
+1 :). Best!
But how would iszero
help here? We wouldn't define it on Symbol
, String
nor Function
, right?
But how would iszero help here? We wouldn't define it on Symbol, String nor Function, right?
In both cases above, iszero(fofzeros)
could check fofzeros == 0
rather than fofzeros == zero(fofzeros)
, skirting lack of definition of zero
for Any
?
Looked closer and I'm mistaken. The zero preservation check discussed above works as written in these cases. Rather, checking whether output entry Cx = f(xs...)
is zero via Cx != zero(eltype(C))
causes the reported failures. Replacing that check with iszero(Cx)
, or shy of that Cx != zero(Cx)
, would fix the second reported failure (the output sparse matrix having mixed numeric type). But with a bit more thought, the present semantics of map[!]
/broadcast[!]
over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of defining zero
for String
s (or at least comparison with 0
). Best!
But with a bit more thought, the present semantics of
map[!]
/broadcast[!]
over sparse matrices (not storing zeros) preclude making the first reported failure work without the pun of definingzero
forString
s (or at least comparison with0
).
That thought was silly. iszero
would save us in that case as well, there being no problem with comparison against 0
: Replacing Cx != zero(eltype(C))
with iszero(Cx)
and having the fallback iszero(x) = x == 0
would fix the first reported failure as well. Thoughts? Best!
Using sparse arrays/vectors for non-algebraic data (types that do not define zero
) seems like a bad pun to me. These are linear-algebra objects.
For example, what does getindex
return for the non-stored entries if zero
is not defined?
If you just want a general "sparse" associative "array" from Int
to a random type T, you should use Dict
or similar.
And if you want to map the nonzero elements of a sparse array to non-numeric values, you should use map(f, nonzeros(A))
.
Dict isn't CSC. It's useful on occasion to have a sparse array that has the same index iteration structure as a numeric SparseMatrixCSC, with effectively undefined non-stored entries. Linear algebra operations won't work on them, but many array manipulation operations will.
Lots of things are occasionally useful but lead to poor library design. We normally try to avoid bad type puns in Base.
Could you give a concrete example where nonzeros(A) would not suffice?
We allow non numeric or heterogeneously typed dense arrays. Allowing for the same with sparse matrices is a feature that it would be a regression to lose. Not being able to do linear algebra on some array types doesn't always make it a pun or bad library design to support.
For this particular example of map I agree nonzeros is mostly good enough. Will have to check whether the assumption that zero(eltype(A))
exists impacts other use cases.
The definition of sparsity here is "mostly zero", which is what makes it seem like a pun, and rather different from an Associative type (a partial function), which is what you seem to want to use it for.
The generalization would make more sense if we allowed an arbitrary default value, but that could cause trouble with code expecting "sparse" to mean "default zero"
Maybe we should have a DefaultArray type with an arbitrary default value, and have SparseMatrixCSC be a subtype?
@stevenj this is a really good idea. It will also provide a path where we don't return dense arrays when we broadcast nonzero preserving functions, which could be a huge win for memory usage.
If we want to use it for avoiding dense broadcast output, then the sparse case can't be a subtype (wouldn't be type stable). I dunno, maybe all sparse arrays should be DefaultArrays after all. @StefanKarpinski was arguing for this in another issue, and I was skeptical, but maybe it's worth the trouble.
The really big advantage would be that all broadcasts would be efficient and produce the same type no matter what. Also, if I understood his proposal correctly, we could do even better, as his was a way to cheat a default from a sparse, but if it were going on base, we could further simplify the code.
For a potential stopgap solution, please see https://github.com/JuliaLang/julia/pull/17623#discussion_r92890116. Best!
If some feel strongly about the stricter iszero definition while others feel strongly about {map/broadcast over sparse matrices where the output element types don't provide zero}, we can accommodate both desires with a strict iszero definition and a _sparsebc_iszero which wraps iszero but provides the permissive fallback.
Type inference seems to have improved (:tada:) to the point that half of the tests for this issue are ineffective:
julia> intoneorfloatzero(x) = x != 0.0 ? Int(1) : Float64(x)
intoneorfloatzero (generic function with 1 method)
julia> foo = map(intoneorfloatzero, speye(4))
4×4 SparseMatrixCSC{Union{Float64, Int64},Int64} with 4 stored entries:
[1, 1] = 1
[2, 2] = 1
[3, 3] = 1
[4, 4] = 1
julia> eltype(foo)
Union{Float64, Int64}
julia> zero(eltype(foo))
0
(Previously zero(eltype(foo))
would fail.) Additionally, I found four code paths not fixed by #19589 and missed by the existing tests. Stronger tests and fix inbound. Best!
Closed by #20862 I believe.
As mentioned in https://github.com/JuliaLang/julia/pull/22945#issuecomment-317923344, the consequence of the fix here might cause some trouble elsewhere. @amitmurthy would you be able to share some real world examples where map
on a SparseMatrix{String,Int}
would be useful?
The specific requirement was for a SparseMatrix{Ref,Int}
in light of how asyncmap
(and hence pmap
) is implemented. While still a TODO - https://github.com/JuliaLang/julia/blob/0a37b3d42a0b159aad0eebefbcf3280dda2966b4/base/asyncmap.jl#L263-L268 - it can be optimized if SparseMatrix{Ref,Int}
continues to be supported.
Other than the above I don't have any other real-world examples for non-numeric sparse arrays.
Above @tkelman
mentions use cases for SparseMatrixCSC
s of symbols and/or functions:
I've found sparse matrices of symbols or functions to be useful on occasion as a data structure, where it was useful to have them in CSC format, do concatenations or similar operations.
Also a problem when mixing ints and floats.
If sparse arrays are not meant to be used with non-numeric data, we should at least throw a better error message.