JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.91k stars 5.49k forks source link

Two sparse triu/tril in Base with different performance. #12649

Closed KristofferC closed 9 years ago

KristofferC commented 9 years ago

There are currently two triu and tril methods for sparse matrices. One in csparse.jl here and one in linalg.jl here

The latter one supports setting a diagonal offset k but for k=0 the methods should be the same. Now timing these:

julia> A = sprand(10^5, 10^5, 25/10^5);

julia> @which triu(A)
triu(A::Base.SparseMatrix.SparseMatrixCSC{Tv,Ti<:Integer}) at sparse/csparse.jl:382

julia> @which triu(A, 0)
triu{Tv,Ti}(S::Base.SparseMatrix.SparseMatrixCSC{Tv,Ti}, k::Integer) at sparse/linalg.jl:268

julia> @time triu(A);
  0.294281 seconds (11 allocations: 38.957 MB, 37.89% gc time)

julia> @time triu(A, 0);
  0.067271 seconds (11 allocations: 19.843 MB, 12.05% gc time)

julia> triu(A) == triu(A,0)
true

means that the triu in csparse.jl (which is the default one and the one everyone is likely using) is slower (at least for this random matrix I tried, not sure how it varies with different sparse structure).

Doesn't it make sense to delete triu in csparse.jl and setting a default k=0 on the one in linalg.jl? Same for tril.

ViralBShah commented 9 years ago

Yes, we can get rid of the ones in csparse.jl. Could you submit a PR?

KristofferC commented 9 years ago

Yeah, one problem is that there is no corresponding triu! in linalg.jl which means that after removing triu! from csparse.jl Julia falls back to the dense implementation. I could of course only remove the non mutating version but it feels better to see if the linalg.jl one could easily be modified to be mutating and then add non mutating methods in the usual way.

tkelman commented 9 years ago

Odd that they differ by so much. fkeep! is fairly generic and it's using a functor at least for the k=0 case so I'm a little surprised to see so much performance difference. fkeep! could potentially be made competitive with some manual hoisting of the field accesses out of the loops.

KristofferC commented 9 years ago

Yeah, I will experiment a bit.

KristofferC commented 9 years ago

The reason for fkeep! being slower is that it copies the whole matrix in its non mutating function. The one in linalg.jl doesn't construct the arrays until it knows how many non zero values will exist in the final matrix.

tkelman commented 9 years ago

Maybe just drop the non-mutating copy version from csparse.jl then. Hoisting may help with the mutating version.

KristofferC commented 9 years ago

Yeah, hoisting + @inbounds is about a 30-40% speedup. I noticed that tril!(A, k) where k is an integer currently falls back to the dense version which of course never finishes. What should be done about that?

tkelman commented 9 years ago

Could try extending TriuFun and TrilFun to include a k offset field? Though I suspect the same conclusions will hold where tril!(copy(A), k) would end up slower than a tril(A, k) that only allocates what it needs.