DrTimothyAldenDavis / GraphBLAS

SuiteSparse:GraphBLAS: graph algorithms in the language of linear algebra. For production: (default) STABLE branch. Code development: ask me for the right branch before submitting a PR. video intro: https://youtu.be/Tj5y6d7FegI .
http://faculty.cse.tamu.edu/davis/GraphBLAS.html
Other
346 stars 61 forks source link

[Question] Right way to reduce memory consumption of Kronecker product. #121

Open gsvgit opened 2 years ago

gsvgit commented 2 years ago

Suppose I want to compute automata intersection using Kronecker product. I should to define monoid over set of sets of symbols where intersection is an elementwise operation, and emty set is an identity. Such a monoid works fine, but there is a technical problem. Kronecker product for sparse matrices assumes that each pairs of stored cells in input matrices produces a result which must be stored. So, if input matrices have N and M non-zero elements respectively, N*M cells will be allocated for result. But for this monoid and for real-world automata the most of cells will contains empy sets, so should be filtered out. It is possible to filter them out using additional filter operation. But nonfilterred result of Kronecker product requires too mach memory to be stored. Is it possible do not allocate memory for cells with empty sets or filter them out on the fly?

Looks like it is a case where kronecker and filter kernels fusion is requred.

DrTimothyAldenDavis commented 2 years ago

I don't know for sure since I don't know what an automata intersection is. By "filter" do you mean GrB_select? I also don't know what you mean by a monoid whose identity value is an empty set. The GraphBLAS monoid doesn't work that way.

The Kronecker product C=kron(A,B) will have nvals(A)nvals(B) entries. Even with C=kron(A,B), using a mask M, my implementation of the Kronecker product does not yet exploit the mask during the computation. It first computes T=kron(A,B) without the mask and then does C=T. Ideally, if M were very sparse, then C=kron(A,B) would take no more space than nvals(M), which could be much much less than nvals(A)nvals(B).

I haven't had enough time to optimize my implementation of the Kronecker product. It doesn't exploit the mask, if present, to reduce work during the computation. It also doesn't exploit built-in operators; it calls them through their function pointers instead. In the future, GrB_kronecker needs to be optimized so it can exploit the mask, and it needs to have multiple kernels to handle built-in operators faster.

If you had a fast implementation of C=kron(A,B), with a mask matrix M, would that help? That would fit within the GraphBLAS API. I just haven't written it yet.

There hasn't been much need for a very fast GrB_kron so far, so I have been spending my efforts on optimizing other parts of GraphBLAS.

DrTimothyAldenDavis commented 2 years ago

Great question, though. I would like to see more use cases of the Kronecker product. So far, I've only seen it used for generating test matrices, and in that case it doesn't have to be fast.