Open kalmarek opened 4 days ago
Please add me to this issue.
@kalmarek Thanks for raising this issue. I believe this is caused because hipSPARSE calls rocSPARSE (on AMD systems) and in rocSPARSE we currently do not support symmetric matrices. In fact in rocSPARSE, we currently only support the general matrix type (no symmetric, triangular, hermitian). This restriction does not really make sense and is something we need to fix.
I am looking into extending the support in rocsparse to more matrix types.
I think I have a solution for your problem. Ill use a concrete example. If I understand what you are trying to do, you have a CSC lower triangular matrix, lets say:
// CSC // 2 0 0 0 // 0 1 0 0 // 3 0 3 0 // 4 1 0 2 hcsc_col_ptr = {0, 3, 5, 6, 7}; hcsc_row_ind = {0, 2, 3, 1, 3, 2, 3}; hcsc_val = {2.0, 3.0, 4.0, 1.0, 1.0, 3.0, 2.0};
You are then interpreting this as a CSR matrix by passing the column pointer array as if it was the row pointer array in CSR, passing the row indices array as if it was the column indices array in CSR, and finally passing the values array. This interpretation results in:
// CSR // 2 0 3 4 // 0 1 0 1 // 0 0 3 0 // 0 0 0 2 hcsr_row_ptr = {0, 3, 5, 6, 7}; // same as hcsc_col_ptr hcsr_col_ind = {0, 2, 3, 1, 3, 2, 3}; // same as hcsc_row_ind hcsr_val = {2.0, 3.0, 4.0, 1.0, 1.0, 3.0, 2.0f}; // same as hcsc_val
This is an upper triangular matrix in CSR format. Now if I understand things correctly, you want to solve the original lower triangular system. I think what you want to do is:
1) Interpret your CSC matrix as a CSR matrix like you are doing 2) As a CSR matrix, this is an upper triangular matrix, so you want to continue setting the fill mode as HIPSPARSE_FILL_MODE_UPPER. 3) Because the HIPSPARSE_MATRIX_TYPE_TRIANGULAR matrix type is not supported, you will need to use HIPSPARSE_MATRIX_TYPE_GENERAL. 4) Pass HIPSPARSE_OPERATION_TRANSPOSE instead of HIPSPARSE_OPERATION_NON_TRANSPOSE so that the system you are solving (stored as a CSR upper triangular matrix) is transposed and therefore results in your original lower triangular system.
```
rocsparse_set_mat_type(descr, rocsparse_matrix_type_general);
rocsparse_set_mat_fill_mode(descr, rocsparse_fill_mode_upper);
rocsparse_set_mat_diag_type(descr, rocsparse_diag_type_non_unit);
rocsparse_status status;
size_t buffer_size;
status = rocsparse_scsrsv_buffer_size(handle,
rocsparse_operation_transpose,
m,
nnz,
descr,
dcsr_val,
dcsr_row_ptr,
dcsr_col_ind,
info,
&buffer_size);
void* dbuffer = nullptr;
hipMalloc((void**)&dbuffer, buffer_size);
status = rocsparse_scsrsv_analysis(handle,
rocsparse_operation_transpose,
m,
nnz,
descr,
dcsr_val,
dcsr_row_ptr,
dcsr_col_ind,
info,
analysis,
solve,
dbuffer);
status = rocsparse_scsrsv_solve(handle,
rocsparse_operation_transpose,
m,
nnz,
dalpha,
descr,
dcsr_val,
dcsr_row_ptr,
dcsr_col_ind,
info,
dx,
dy,
solve,
dbuffer);
```
Please let me know if I am misunderstanding your issue.
@jsandham thanks for replies!
What I'm trying to do is to solve Ax=b
with changing b
s and constant sparsity pattern in A
. So I want to analyze & factorize A
first, and then
b
s andA
occur, re-factorize it re-using as much as possible in the process.While re-solves happen very often (every iteration), the diagonal rescaling of A
happens rather rarely. Still A
can be rather large (in the range of 1e6-1e7).
A
comes to me as sparse CSC matrix, symmetric, but only the lower triangle is stored.
Here's a mwe with example data A
from the first non-trivial test of ours baked in. It corresponds to the following matrix:
10×10 SparseMatrixCSC{Float64, Int64} with 18 stored entries:
1.0e-6 ⋅ ⋅ 0.736161 1.18921 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ 1.0e-6 ⋅ -0.642353 ⋅ ⋅ ⋅ -0.875448 ⋅ ⋅
⋅ ⋅ -10.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
0.736161 -0.642353 ⋅ -10.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
1.18921 ⋅ ⋅ ⋅ -10.0 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ -10.0 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -10.0 ⋅ ⋅ ⋅
⋅ -0.875448 ⋅ ⋅ ⋅ ⋅ ⋅ -10.0 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -10.0 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ -10.0
commented is also the expanded version (where both triangles are stored).
// hipcc -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include -L/opt/rocm/lib -lhipsparse mwe.c -o mwe
#include <stdio.h>
#include <hip/hip_runtime.h>
#include <hipsparse/hipsparse.h>
typedef struct
{
/** Matrix values, size: number of non-zeros. */
double *x;
/** Matrix row indices, size: number of non-zeros. */
int *i;
/** Matrix column pointers, size: `n+1`. */
int *p;
/** Number of rows. */
int m;
/** Number of columns. */
int n;
} CSCMatrix;
CSCMatrix *init_data()
{
CSCMatrix *A = calloc(1, sizeof(CSCMatrix));
// symmetric expanded
// int p[] = {0, 3, 6, 7, 10, 12, 13, 14, 16, 17, 18};
// int i[] = {0, 3, 4, 1, 3, 7, 2, 0, 1, 3, 0, 4, 5, 6, 1, 7, 8, 9};
// double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, 0.736161, -0.642353, -10.0, 1.189207, -10.0, -10.0, -10.0, -0.875448, -10.0, -10.0, -10.0};
// lower triangle only
int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};
A->m = A->n = 10;
A->p = p;
A->i = i;
A->x = x;
return A;
}
int main()
{
hipsparseStatus_t status;
hipsparseHandle_t handle;
status = hipsparseCreate(&(handle));
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- handle: %d.\n", (int)status);
}
hipsparseMatDescr_t descr;
status = hipsparseCreateMatDescr(&(descr));
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- descriptor: %d.\n", (int)status);
}
status = hipsparseSetMatIndexBase(descr, HIPSPARSE_INDEX_BASE_ZERO); // Zero-based indexing
status = hipsparseSetMatType(descr, HIPSPARSE_MATRIX_TYPE_SYMMETRIC); // Symmetric matrix
status = hipsparseSetMatDiagType(descr, HIPSPARSE_DIAG_TYPE_NON_UNIT); // with non-unit diagonal
status = hipsparseSetMatFillMode(descr, HIPSPARSE_FILL_MODE_UPPER); // data matrix is symmetric and expanded anyway
// CSR matrix data on the device
double *d_vals; // Non-zero values of the matrix (on device)
int *d_row_ptrs; // Row pointers (on device)
int *d_col_inds; // Column indices (on device)
CSCMatrix *A = init_data(); // A is symmetric in our example!!!
int nnz = A->p[A->n]; // The last element of A->p gives the number of non-zeros
hipMalloc((void **)&(d_vals), nnz * sizeof(double)); // Matrix values (number of non-zeros)
hipMalloc((void **)&(d_row_ptrs), (A->n + 1) * sizeof(int)); // Column pointers (size n + 1)
hipMalloc((void **)&(d_col_inds), nnz * sizeof(int)); // Row indices (number of non-zeros)
hipMemcpy(d_vals, A->x, nnz * sizeof(double), hipMemcpyHostToDevice);
hipMemcpy(d_row_ptrs, A->p, (A->n + 1) * sizeof(int), hipMemcpyHostToDevice);
hipMemcpy(d_col_inds, A->i, nnz * sizeof(int), hipMemcpyHostToDevice);
csrsv2Info_t info_LU;
status = hipsparseCreateCsrsv2Info(&info_LU);
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- create info: %d.\n", (int)status);
}
void *buffer;
int buffer_size;
hipsparseOperation_t NOOP = HIPSPARSE_OPERATION_NON_TRANSPOSE;
hipsparseSolvePolicy_t NOPOLICY = HIPSPARSE_SOLVE_POLICY_NO_LEVEL;
status = hipsparseDcsrsv2_bufferSize(handle, NOOP, A->m, nnz, descr,
d_vals, d_row_ptrs, d_col_inds,
info_LU, &buffer_size);
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- bufferSize: %d.\n", (int)status);
}
printf("Allocating %d bytes for csrsv2.\n", buffer_size);
hipMalloc((void **)&buffer, buffer_size);
// Perform symbolic factorization for LU
status = hipsparseDcsrsv2_analysis(handle, NOOP, A->m, nnz, descr,
d_vals, d_row_ptrs, d_col_inds,
info_LU, NOPOLICY, buffer);
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- analysis: %d.\n", (int)status);
}
// Perform numeric factorization (LU decomposition)
double alpha = 1.0;
status = hipsparseDcsrsv2_solve(handle, NOOP, A->m, nnz, &alpha, descr,
d_vals, d_row_ptrs, d_col_inds,
info_LU, NULL, NULL, NOPOLICY, buffer);
if (status != HIPSPARSE_STATUS_SUCCESS)
{
printf("Error in init -- numeric fact: %d.\n", (int)status);
}
return status;
}
@kalmarek Thanks for raising this issue. I believe this is caused because hipSPARSE calls rocSPARSE (on AMD systems) and in rocSPARSE we currently do not support symmetric matrices. In fact in rocSPARSE, we currently only support the general matrix type (no symmetric, triangular, hermitian). This restriction does not really make sense and is something we need to fix.
I am looking into extending the support in rocsparse to more matrix types.
@jsandham extending rocsparse to support symmetric matrices would be lovely! keep me posted!
@jsandham just the last comment about mwe above:
even with A
expanded symmetric and
status = hipsparseSetMatIndexBase(descr, HIPSPARSE_INDEX_BASE_ZERO); // Zero-based indexing
// status = hipsparseSetMatType(descr, HIPSPARSE_MATRIX_TYPE_SYMMETRIC); // Symmetric matrix
status = hipsparseSetMatDiagType(descr, HIPSPARSE_DIAG_TYPE_NON_UNIT); // with non-unit diagonal
// status = hipsparseSetMatFillMode(descr, HIPSPARSE_FILL_MODE_UPPER); // data matrix is symmetric and expanded anyway
I receive HIPSPARSE_STATUS_INVALID_VALUE
and 0
bytes are allocated for the buffer.
I think your problem lies in how you are creating your CSCMatrix and setting the data:
CSCMatrix *init_data()
{
CSCMatrix *A = calloc(1, sizeof(CSCMatrix));
// symmetric expanded
// int p[] = {0, 3, 6, 7, 10, 12, 13, 14, 16, 17, 18};
// int i[] = {0, 3, 4, 1, 3, 7, 2, 0, 1, 3, 0, 4, 5, 6, 1, 7, 8, 9};
// double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, 0.736161, -0.642353, -10.0, 1.189207, -10.0, -10.0, -10.0, -0.875448, -10.0, -10.0, -10.0};
// lower triangle only
int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};
A->m = A->n = 10;
A->p = p;
A->i = i;
A->x = x;
return A;
}
Specifically I don't think you can set the arrays using local variables that go out of scope once init_data is finished. Hint: print the value of nnz. I would recommend the following:
// CSR matrix data on the device
double *d_vals; // Non-zero values of the matrix (on device)
int *d_row_ptrs; // Row pointers (on device)
int *d_col_inds; // Column indices (on device)
int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};
CSCMatrix* A = malloc(sizeof(CSCMatrix)); // I would not use malloc/calloc here at all...
A->m = A->n = 10;
int nnz = p[A->m]; // The last element of A->p gives the number of non-zeros
A->p = malloc(sizeof(int) * (A->m + 1));
A->i = malloc(sizeof(int) * nnz);
A->x = malloc(sizeof(double) * nnz);
memcpy(A->p, p, sizeof(int) * (A->m + 1));
memcpy(A->i, i, sizeof(int) * nnz);
memcpy(A->x, x, sizeof(double) * nnz);
hipMalloc((void **)&(d_vals), nnz * sizeof(double)); // Matrix values (number of non-zeros)
hipMalloc((void **)&(d_row_ptrs), (A->n + 1) * sizeof(int)); // Column pointers (size n + 1)
hipMalloc((void **)&(d_col_inds), nnz * sizeof(int)); // Row indices (number of non-zeros)
// At the end
free(A->p);
free(A->i);
free(A->x);
free(A);
Doing this allows the buffer size and analysis to pass.
The call to hipsparseDcsrsv2_solve will fail as no right hand size or solution vector has been provided.
Just another note. I am a little worried by:
// Perform numeric factorization (LU decomposition)
double alpha = 1.0;
status = hipsparseDcsrsv2_solve(handle, NOOP, A->m, nnz, &alpha, descr,
d_vals, d_row_ptrs, d_col_inds,
info_LU, NULL, NULL, NOPOLICY, buffer);
Specifically the code comment suggesting that this routine will perform an LU decomposition. Just to be clear, that is not what this routine does. It is the final step in solving the system Mx = b where M is either upper or lower triangular. It sounds like what you may be trying to achieve is to first compute the LU factorization, i.e. A=LU, and then using this factorization to solve:
Ax=b Compute A=LU resulting in system LUx=b Let y = Ux Solve Ly=b first Then solve Ux = y to get x.
If this is the case, you may want to look at https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/reference/precond.html#rocsparse-csrilu0 example. Note that this is an incomplete LU factorization with no fill-in so will not solve Ax=b exactly. Incomplete LU factorization is designed for use in preconditioners for iterative solvers.
As this example shows, you will need to make multiple calls to hipsparseDcsrsv2_solve but you will also need to first call rocsparse_dcsrilu0
Problem Description
I'm trying to solve a sparse symmetric system; Originally my matrix is CSC, square, only lower triangle is stored. This is how I create sparse hip matrix:
then I set matrix descriptor like this
and finally I want to start analyzing for csrsv2:
here I get
HIPSPARSE_STATUS_NOT_SUPPORTED
.The only difference I see from the example here https://rocmdocs.amd.com/projects/hipSPARSE/en/latest/reference/level2.html#hipsparsexcsrsv2-solve is that I set my matrix to be symmetric;
btw. in the example the matrix is clearly non-symmetric, yet the FILL_MODE is set. do I misunderstand what does this parameter mean?
Operating System
ubuntu 24.04.01
CPU
AMD Ryzen 7 7840U
GPU
AMD Radeon VII
ROCm Version
ROCm 6.2.0
ROCm Component
hipSPARSE
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
$ rocminfo --support ROCk module version 6.8.5 is loaded =====================
HSA System Attributes
=====================
Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES
==========
HSA Agents
==========
Agent 1
Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5289
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 9(0x9)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2700
BDFID: 49920
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 39
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 14252340(0xd97934) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 14252340(0xd97934) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done
Additional Information
No response