DrTimothyAldenDavis / SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.
https://people.engr.tamu.edu/davis/suitesparse.html
Other
1.17k stars 261 forks source link

support for CHOLMOD_INTLONG on CHOLMOD_PATTERN matrices #350

Open schloegl opened 1 year ago

schloegl commented 1 year ago

Is your feature request related to a problem? Please describe.

I'm working with large sparse boolean matrices. These matrices are in the order of 10^5 to 10^7 rows and columns, and a density of 1-10%. This results in more than 2^32 nnz values, so CHOLMOD_INT is not sufficient. CHOLMOD_LONG uses 8 instead of 4 bytes for column and row indices, which requires twice as much memory than necessary.

CHOLMOD_INTLONG seem to be the proper matrix mode for my purpose, but cholmod_core.h contains this comment:

/* The itype of all parameters for all CHOLMOD routines must match.
 * FUTURE WORK: CHOLMOD_INTLONG is not yet supported.
 */

I checked this in suitesparse v5.8 and als the latest (v.7.1.0) release.

Describe the solution you'd like

Support for functions in cholmod_core and cholmod_matrix operations on boolean (i.e. CHOLMOD_PATTERN) matrices. If this is already partially supported, I'd like to see what is supported and what is not supported.

Describe alternatives you've considered

Currently, I've my own single-threaded implementation for boolean sparse matrices, I'd need to make it multi-threaded. Using CHOLMOD_LONG would take twice as much memory which has a significant effect on performance and the maximum problem size.

DrTimothyAldenDavis commented 1 year ago

I'm not planning on creating the CHOLMOD_INTLONG.

Adding matrix operations for boolean matrices is also not feasible.

I do plan on adding support for single precision floating point, but I'm not able to add boolean matrices.

What operations do you need? GraphBLAS has boolean matrices, and can also exploit "iso-valued" matrices (my terminology) in which all entries in the pattern of the matrix have the same value.

On Tue, Jul 4, 2023 at 5:59 AM Alois Schlögl @.***> wrote:

Is your feature request related to a problem? Please describe.

I'm working with large sparse boolean matrices. These matrices are in the order of 10^5 to 10^7 rows and columns, and a density of 1-10%. This results in more than 2^32 nnz values, so CHOLMOD_INT is not sufficient. CHOLMOD_LONG uses 8 instead of 4 bytes for column and row indices, which requires twice as much memory than necessary.

CHOLMOD_INTLONG seem to be the proper matrix mode for my purpose, but cholmod_core.h contains this comment:

/* The itype of all parameters for all CHOLMOD routines must match.

  • FUTURE WORK: CHOLMOD_INTLONG is not yet supported. */

I checked this in suitesparse v5.8 and als the latest (v.7.1.0) release.

Describe the solution you'd like

Support for functions in cholmod_core and cholmod_matrix operations on boolean (i.e. CHOLMOD_PATTERN) matrices. If this is already partially supported, I'd like to see what is supported and what is not supported.

Describe alternatives you've considered

Currently, I've my own single-threaded implementation for boolean sparse matrices, I'd need to make it multi-threaded. Using CHOLMOD_LONG would take twice as much memory which has a significant effect on performance and the maximum problem size.

— Reply to this email directly, view it on GitHub https://github.com/DrTimothyAldenDavis/SuiteSparse/issues/350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEYIIOIGV5HEKSQOJCVPQVLXOPZQXANCNFSM6AAAAAAZ5SFLDQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

schloegl commented 1 year ago

Thanks for your reply. Concerning your questions, which operations we need I'll use Matlab-like pseudo code to describe the problem. I hope that is fine with you.

WJ = boolean(W .* (Z'*Z))                  (equ 1)
X{1} = Z;    % slightly modified
for t=1:10, % or more 
    TH =  g0 + g1 * sum(X{t},2)/N;             (equ 2)
    X{t+1} = X{t} * WJ > TH(:);                (equ 3)
    for k=1:100, % or 1000
        cc(t,k) = corr(Z(k,:),X{t+1}(k,:));       (equ 4)
    end
end

W is a sparse, booolan, non-symetric, square martrix (size N-times-N), density 0-20% Z is a sparse boolean matrix of size M-times-N, density e.g. 0.0002 X{t} is a boolean matrix of size 100(0)-times-N (sparse or full does not matter) TH is a threshold vector of length 100(0),
g0, g1 scalars

W is either a random matrix, or precomputed. Z can be also random or precomputed.

N up to 1e7, M varies but is always smaller than N, usually M < N/2 is sufficient. X{t+1},X{t}, up to 10 and more itertations of, but memory for two instances is sufficient

In order to save memory we use already:

  1. CSC format is used for W, WJ, Z
  2. 32 bit row indices are used in W, WJ, Z,
  3. no value fields are used.
  4. WJ is used in place of W
  5. W is usually computed on the fly (except when precomputed)

In order to save compute time, we do already :

  1. (equ 1) is evaluated only for the non-zero W(i,j)
  2. element (i,j) of Z'*Z is evaluated as any(Z(i,:) | Z(j,:)), and is stopped as soon as a matching pair of 1's is found
  3. for (equ 4) have an efficient method for boolean matrices.

The memory requirement of X{t}, and X{t+1} is considered not crucial, currently we use full matrices.

At this moment, I do not see how these equations can be efficiently implemented in graphblas. Moreover, (equ 3) the product of X*WJ is not "iso-valued". I expect the biggest performance gains from a multithreaded implementation of equatuions (1) and (3). Do you have an suggestion how to achieve this with the help of suitesparse/graphblas ?

DrTimothyAldenDavis commented 1 year ago

I handle nearly all of this in GraphBLAS. In particular, using the right semiring, many of my monoids are what I call "terminal". They halt early, like a short-circuit AND or OR. The first line is a masked dot product,

WJ = Z'*Z

using the GxB_ANY_PAIR_BOOL semiring.

Eqn 3 is unclear (it needs parentheses for me to understand the order of operation).

GraphBLAS uses 64 bit integers however, not 32-bit, since I have to handle matrices of very large dimension.

The corr function is the Pearson's correlation? It might be computable with a user-defined semiring but I haven't worked out the details.

schloegl commented 1 year ago

Equ 3 is following the operator precedence for Matlab, i.e. X{t+1} = ( X{t} * WJ ) > TH(:); (equ 3) concerning "corr()": yes, it is pearsons correlation cofficient. For binary vectors, its also Matthews correlation, or Phi coefficient and can be efficiently computed through the 2x2 confusion matrix (see https://en.wikipedia.org/wiki/Phi_coefficient )

DrTimothyAldenDavis commented 1 year ago

Adding support for 32-bit integers for matrix row/col indices would be a major change. I have to support 64-bit integers and can't replace that with 32-bit indices. I would need to support both, and also support arbitary mixing of 32 and 64 bits.

rayegun commented 1 year ago

@schloegl if you need a special build for these small matrices, and don't care about using int64_t values you could try sed to find and replace int64_t with int32_t. Of course do this at your own risk and it would be unsupported.

But you limit the size of your matrices quite a bit by doing so. Is this for some sort of embedded device?

schloegl commented 1 year ago

If I understand your suggestions correctly, this is the equivalent of using CHOLMOD_INT. That would work for the row- and column indices, but in my case the number-of-nonzeros can be larger then 2^32 (~4e9). So int32_t would not be sufficient for that.

DrTimothyAldenDavis commented 1 year ago

This would require the "pointer" arrays in my sparse matrix data structures to be held in int64, with the row/col indices held in int32. I agree that would be a nice feature, but I currently don't have plans for a mixed int64/int32 version of either CHOLMOD or GraphBLAS.

schloegl commented 1 year ago

@DrTimothyAldenDavis: Yes, I understood that from our earlier discussion. I just wanted to clarify why the suggestion of @Wimmerer is not suitable for my purpose. Based on this discussion, I believe you should close this issue with "Won't fix".

@Wimmerer: To answer your question, no its not about some embedded device, but about some large scale simulation, where memory and memory bandwidth is the limiting factor.

DrTimothyAldenDavis commented 1 year ago

Got it. "Won't fix" is a bit strong, since it's an issue I'm aware of and ideally could do in the future. It's something I've thought about for my recent GraphBLAS package, but it's lower on the priority list as compared to other issues. So perhaps "Won't fix in the short or medium term, but may do this in the long term".

rayegun commented 1 year ago

@schloegl for my own curiosity is this a common use case in your field? In my current work I'm making the assumption that all indices are the same size integer. But that's not strictly necessary, and it's possible in the future that (in the Julia language I work on) the sparse libraries could support mixing index types.

schloegl commented 1 year ago

@Wimmerer : each time you have a sparse, square matrix with more than 65536 rows/columns, you might want to be able of storing more the 4G non-zeros.

DrTimothyAldenDavis commented 1 year ago

Yes, that's a useful case to handle (matrices of dimension at or less than, say 2 billion, but with more than 4 billion entries). I just can't handle that case for now without a lot of work. I'll need to leave it for future work.