This uses a two-step sort where we first use tbl.ptr to bucket the idxs & values. Then, we sort each bucket individually. Some initial testing showed a 2x matrix transpose improvement:
using Finch
using BenchmarkTools
n = 100000
nnz = 1000000
A = Tensor(Dense(SparseList(Element(0.0))), fsprand(n, n, nnz))
C = Tensor(Dense(Sparse(Element(0.0))))
eval(@finch_kernel mode=:fast function tp(A, C)
C .= 0
for i=_
for j=_
C[i,j] = A[j,i]
end
end
end)
hash_times = []
@benchmark tp($A, $C)
This uses a two-step sort where we first use tbl.ptr to bucket the idxs & values. Then, we sort each bucket individually. Some initial testing showed a 2x matrix transpose improvement: