ROCm / hipSPARSE

ROCm SPARSE marshalling library
https://rocm.docs.amd.com/projects/hipSPARSE/en/latest/
MIT License
66 stars 41 forks source link

[Issue]: hipsparseDcsrsv2 and HIPSPARSE_MATRIX_TYPE_SYMMETRIC #541

Open kalmarek opened 4 days ago

kalmarek commented 4 days ago

Problem Description

I'm trying to solve a sparse symmetric system; Originally my matrix is CSC, square, only lower triangle is stored. This is how I create sparse hip matrix:

  // A is of size n × m, with
  // A->n, A->m : the sizes
  // A->p : column pointers
  // A->i : row indices
  // A->x : data as doble*
  int nnz = A->p[A->n]; // The last element of A->p gives the number of non-zeros

  // Allocate memory on device
  hipMalloc(&(work->d_vals), nnz * sizeof(double));          // Matrix values (number of non-zeros)
  hipMalloc(&(work->d_row_ptrs), (A->n + 1) * sizeof(int)); // Column pointers (size n + 1)
  hipMalloc(&(work->d_col_inds), nnz * sizeof(int));        // Row indices (number of non-zeros)

  // Copy matrix to device
  // here we switch to CSR, only upper triangle is stored
  hipMemcpy(work->d_vals, A->x, nnz * sizeof(double), hipMemcpyHostToDevice);
  hipMemcpy(work->d_row_ptrs, A->p, (A->n + 1) * sizeof(int), hipMemcpyHostToDevice);
  hipMemcpy(work->d_col_inds, A->i, nnz * sizeof(int), hipMemcpyHostToDevice);

then I set matrix descriptor like this

  // Initialize matrix descriptor
  status = hipsparseCreateMatDescr(&(work->descr));
  if (status != HIPSPARSE_STATUS_SUCCESS)
  {
    printf("Error in init -- descriptor: %d.\n", (int)status);
  }
  hipsparseSetMatIndexBase(work->descr, HIPSPARSE_INDEX_BASE_ZERO); // Zero-based indexing
  if (status != HIPSPARSE_STATUS_SUCCESS)
  {
    printf("Error in init -- index 0: %d.\n", (int)status);
  }
  status = hipsparseSetMatType(work->descr, HIPSPARSE_MATRIX_TYPE_SYMMETRIC); // Symmetric matrix
  if (status != HIPSPARSE_STATUS_SUCCESS)
  {
    printf("Error in init -- type symmetric: %d.\n", (int)status);
  }
  status = hipsparseSetMatFillMode(work->descr, HIPSPARSE_FILL_MODE_UPPER); // stored in upper-diagonal part
  if (status != HIPSPARSE_STATUS_SUCCESS)
  {
    printf("Error in init -- fill upper: %d.\n", (int)status);
  }
  status = hipsparseSetMatDiagType(work->descr, HIPSPARSE_DIAG_TYPE_NON_UNIT); // with non-unit diagonal
  if (status != HIPSPARSE_STATUS_SUCCESS)
  {
    printf("Error in init -- diagonal non-unit: %d.\n", (int)status);
  }

and finally I want to start analyzing for csrsv2:

  status = hipsparseCreate(&(work->handle)); // HIPSPARSE_STATUS_SUCCESS
  // Create info object for LU decomposition
  status = hipsparseCreateCsrsv2Info(&(work->info_LU)); // HIPSPARSE_STATUS_SUCCESS
  // Analyze step to get buffer size
  status = hipsparseDcsrsv2_bufferSize(work->handle, HIPSPARSE_OPERATION_NON_TRANSPOSE,
                                       A->m, nnz, work->descr, work->d_vals, work->d_row_ptrs,
                                       work->d_col_inds, work->info_LU, &(work->bufferSize));

here I get HIPSPARSE_STATUS_NOT_SUPPORTED.

The only difference I see from the example here https://rocmdocs.amd.com/projects/hipSPARSE/en/latest/reference/level2.html#hipsparsexcsrsv2-solve is that I set my matrix to be symmetric;

btw. in the example the matrix is clearly non-symmetric, yet the FILL_MODE is set. do I misunderstand what does this parameter mean?

Operating System

ubuntu 24.04.01

CPU

AMD Ryzen 7 7840U

GPU

AMD Radeon VII

ROCm Version

ROCm 6.2.0

ROCm Component

hipSPARSE

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ rocminfo --support ROCk module version 6.8.5 is loaded =====================
HSA System Attributes
=====================
Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES

==========
HSA Agents
==========


Agent 1


Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 7840U w/ Radeon 780M Graphics Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5289
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 28504680(0x1b2f268) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1103
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 2048(0x800) KB
Chip ID: 5567(0x15bf)
ASIC Revision: 9(0x9)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2700
BDFID: 49920
Internal Node ID: 1
Compute Unit: 12
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties: APU Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 39
SDMA engine uCode:: 18
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 14252340(0xd97934) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 14252340(0xd97934) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1103
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done

Additional Information

No response

kswirydo commented 1 day ago

Please add me to this issue.

jsandham commented 10 hours ago

@kalmarek Thanks for raising this issue. I believe this is caused because hipSPARSE calls rocSPARSE (on AMD systems) and in rocSPARSE we currently do not support symmetric matrices. In fact in rocSPARSE, we currently only support the general matrix type (no symmetric, triangular, hermitian). This restriction does not really make sense and is something we need to fix.

I am looking into extending the support in rocsparse to more matrix types.

jsandham commented 8 hours ago

I think I have a solution for your problem. Ill use a concrete example. If I understand what you are trying to do, you have a CSC lower triangular matrix, lets say:

// CSC // 2 0 0 0 // 0 1 0 0 // 3 0 3 0 // 4 1 0 2 hcsc_col_ptr = {0, 3, 5, 6, 7}; hcsc_row_ind = {0, 2, 3, 1, 3, 2, 3}; hcsc_val = {2.0, 3.0, 4.0, 1.0, 1.0, 3.0, 2.0};

You are then interpreting this as a CSR matrix by passing the column pointer array as if it was the row pointer array in CSR, passing the row indices array as if it was the column indices array in CSR, and finally passing the values array. This interpretation results in:

// CSR // 2 0 3 4 // 0 1 0 1 // 0 0 3 0 // 0 0 0 2 hcsr_row_ptr = {0, 3, 5, 6, 7}; // same as hcsc_col_ptr hcsr_col_ind = {0, 2, 3, 1, 3, 2, 3}; // same as hcsc_row_ind hcsr_val = {2.0, 3.0, 4.0, 1.0, 1.0, 3.0, 2.0f}; // same as hcsc_val

This is an upper triangular matrix in CSR format. Now if I understand things correctly, you want to solve the original lower triangular system. I think what you want to do is:

1) Interpret your CSC matrix as a CSR matrix like you are doing 2) As a CSR matrix, this is an upper triangular matrix, so you want to continue setting the fill mode as HIPSPARSE_FILL_MODE_UPPER. 3) Because the HIPSPARSE_MATRIX_TYPE_TRIANGULAR matrix type is not supported, you will need to use HIPSPARSE_MATRIX_TYPE_GENERAL. 4) Pass HIPSPARSE_OPERATION_TRANSPOSE instead of HIPSPARSE_OPERATION_NON_TRANSPOSE so that the system you are solving (stored as a CSR upper triangular matrix) is transposed and therefore results in your original lower triangular system.

```
rocsparse_set_mat_type(descr, rocsparse_matrix_type_general);
rocsparse_set_mat_fill_mode(descr, rocsparse_fill_mode_upper);
rocsparse_set_mat_diag_type(descr, rocsparse_diag_type_non_unit);

rocsparse_status status;

size_t buffer_size;
status = rocsparse_scsrsv_buffer_size(handle,
                            rocsparse_operation_transpose,
                            m,
                            nnz,
                            descr,
                            dcsr_val,
                            dcsr_row_ptr,
                            dcsr_col_ind,
                            info,
                            &buffer_size);

void* dbuffer = nullptr;
hipMalloc((void**)&dbuffer, buffer_size);

status = rocsparse_scsrsv_analysis(handle,
                        rocsparse_operation_transpose,
                        m,
                        nnz,
                        descr,
                        dcsr_val,
                        dcsr_row_ptr,
                        dcsr_col_ind,
                        info,
                        analysis,
                        solve,
                        dbuffer);
status = rocsparse_scsrsv_solve(handle,
                        rocsparse_operation_transpose,
                        m,
                        nnz,
                        dalpha,
                        descr,
                        dcsr_val,
                        dcsr_row_ptr,
                        dcsr_col_ind,
                        info,
                        dx,
                        dy,
                        solve,
                        dbuffer);
```

Please let me know if I am misunderstanding your issue.

kalmarek commented 7 hours ago

@jsandham thanks for replies! What I'm trying to do is to solve Ax=b with changing bs and constant sparsity pattern in A. So I want to analyze & factorize A first, and then

While re-solves happen very often (every iteration), the diagonal rescaling of A happens rather rarely. Still A can be rather large (in the range of 1e6-1e7).

A comes to me as sparse CSC matrix, symmetric, but only the lower triangle is stored.

Here's a mwe with example data A from the first non-trivial test of ours baked in. It corresponds to the following matrix:

10×10 SparseMatrixCSC{Float64, Int64} with 18 stored entries:
 1.0e-6      ⋅           ⋅     0.736161    1.18921     ⋅      ⋅      ⋅           ⋅      ⋅ 
  ⋅         1.0e-6       ⋅    -0.642353     ⋅          ⋅      ⋅    -0.875448     ⋅      ⋅ 
  ⋅          ⋅        -10.0     ⋅           ⋅          ⋅      ⋅      ⋅           ⋅      ⋅ 
 0.736161  -0.642353     ⋅   -10.0          ⋅          ⋅      ⋅      ⋅           ⋅      ⋅ 
 1.18921     ⋅           ⋅      ⋅        -10.0         ⋅      ⋅      ⋅           ⋅      ⋅ 
  ⋅          ⋅           ⋅      ⋅           ⋅       -10.0     ⋅      ⋅           ⋅      ⋅ 
  ⋅          ⋅           ⋅      ⋅           ⋅          ⋅   -10.0     ⋅           ⋅      ⋅ 
  ⋅        -0.875448     ⋅      ⋅           ⋅          ⋅      ⋅   -10.0          ⋅      ⋅ 
  ⋅          ⋅           ⋅      ⋅           ⋅          ⋅      ⋅      ⋅        -10.0     ⋅ 
  ⋅          ⋅           ⋅      ⋅           ⋅          ⋅      ⋅      ⋅           ⋅   -10.0

commented is also the expanded version (where both triangles are stored).

// hipcc -D__HIP_PLATFORM_AMD__ -I/opt/rocm/include -L/opt/rocm/lib -lhipsparse mwe.c -o mwe
#include <stdio.h>

#include <hip/hip_runtime.h>
#include <hipsparse/hipsparse.h>

typedef struct
{
    /** Matrix values, size: number of non-zeros. */
    double *x;
    /** Matrix row indices, size: number of non-zeros. */
    int *i;
    /** Matrix column pointers, size: `n+1`. */
    int *p;
    /** Number of rows. */
    int m;
    /** Number of columns. */
    int n;
} CSCMatrix;

CSCMatrix *init_data()
{
    CSCMatrix *A = calloc(1, sizeof(CSCMatrix));

    // symmetric expanded
    // int p[] = {0, 3, 6, 7, 10, 12, 13, 14, 16, 17, 18};
    // int i[] = {0, 3, 4, 1, 3, 7, 2, 0, 1, 3, 0, 4, 5, 6, 1, 7, 8, 9};
    // double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, 0.736161, -0.642353, -10.0, 1.189207, -10.0, -10.0, -10.0, -0.875448, -10.0, -10.0, -10.0};

    // lower triangle only

    int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
    int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
    double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};

    A->m = A->n = 10;
    A->p = p;
    A->i = i;
    A->x = x;
    return A;
}

int main()
{
    hipsparseStatus_t status;
    hipsparseHandle_t handle;
    status = hipsparseCreate(&(handle));
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- handle: %d.\n", (int)status);
    }

    hipsparseMatDescr_t descr;
    status = hipsparseCreateMatDescr(&(descr));
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- descriptor: %d.\n", (int)status);
    }
    status = hipsparseSetMatIndexBase(descr, HIPSPARSE_INDEX_BASE_ZERO);   // Zero-based indexing
    status = hipsparseSetMatType(descr, HIPSPARSE_MATRIX_TYPE_SYMMETRIC);  // Symmetric matrix
    status = hipsparseSetMatDiagType(descr, HIPSPARSE_DIAG_TYPE_NON_UNIT); // with non-unit diagonal
    status = hipsparseSetMatFillMode(descr, HIPSPARSE_FILL_MODE_UPPER);    // data matrix is symmetric and expanded anyway

    // CSR matrix data on the device
    double *d_vals;  // Non-zero values of the matrix (on device)
    int *d_row_ptrs; // Row pointers (on device)
    int *d_col_inds; // Column indices (on device)

    CSCMatrix *A = init_data(); // A is symmetric in our example!!!
    int nnz = A->p[A->n];       // The last element of A->p gives the number of non-zeros

    hipMalloc((void **)&(d_vals), nnz * sizeof(double));         // Matrix values (number of non-zeros)
    hipMalloc((void **)&(d_row_ptrs), (A->n + 1) * sizeof(int)); // Column pointers (size n + 1)
    hipMalloc((void **)&(d_col_inds), nnz * sizeof(int));        // Row indices (number of non-zeros)

    hipMemcpy(d_vals, A->x, nnz * sizeof(double), hipMemcpyHostToDevice);
    hipMemcpy(d_row_ptrs, A->p, (A->n + 1) * sizeof(int), hipMemcpyHostToDevice);
    hipMemcpy(d_col_inds, A->i, nnz * sizeof(int), hipMemcpyHostToDevice);

    csrsv2Info_t info_LU;
    status = hipsparseCreateCsrsv2Info(&info_LU);
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- create info: %d.\n", (int)status);
    }

    void *buffer;
    int buffer_size;
    hipsparseOperation_t NOOP = HIPSPARSE_OPERATION_NON_TRANSPOSE;
    hipsparseSolvePolicy_t NOPOLICY = HIPSPARSE_SOLVE_POLICY_NO_LEVEL;
    status = hipsparseDcsrsv2_bufferSize(handle, NOOP, A->m, nnz, descr,
                                         d_vals, d_row_ptrs, d_col_inds,
                                         info_LU, &buffer_size);
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- bufferSize: %d.\n", (int)status);
    }
    printf("Allocating %d bytes for csrsv2.\n", buffer_size);
    hipMalloc((void **)&buffer, buffer_size);

    // Perform symbolic factorization for LU
    status = hipsparseDcsrsv2_analysis(handle, NOOP, A->m, nnz, descr,
                                       d_vals, d_row_ptrs, d_col_inds,
                                       info_LU, NOPOLICY, buffer);
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- analysis: %d.\n", (int)status);
    }
    // Perform numeric factorization (LU decomposition)
    double alpha = 1.0;
    status = hipsparseDcsrsv2_solve(handle, NOOP, A->m, nnz, &alpha, descr,
                                    d_vals, d_row_ptrs, d_col_inds,
                                    info_LU, NULL, NULL, NOPOLICY, buffer);
    if (status != HIPSPARSE_STATUS_SUCCESS)
    {
        printf("Error in init -- numeric fact: %d.\n", (int)status);
    }
    return status;
}
kalmarek commented 7 hours ago

@kalmarek Thanks for raising this issue. I believe this is caused because hipSPARSE calls rocSPARSE (on AMD systems) and in rocSPARSE we currently do not support symmetric matrices. In fact in rocSPARSE, we currently only support the general matrix type (no symmetric, triangular, hermitian). This restriction does not really make sense and is something we need to fix.

I am looking into extending the support in rocsparse to more matrix types.

@jsandham extending rocsparse to support symmetric matrices would be lovely! keep me posted!

kalmarek commented 7 hours ago

@jsandham just the last comment about mwe above:

even with A expanded symmetric and

    status = hipsparseSetMatIndexBase(descr, HIPSPARSE_INDEX_BASE_ZERO); // Zero-based indexing
    // status = hipsparseSetMatType(descr, HIPSPARSE_MATRIX_TYPE_SYMMETRIC);  // Symmetric matrix
    status = hipsparseSetMatDiagType(descr, HIPSPARSE_DIAG_TYPE_NON_UNIT); // with non-unit diagonal
    // status = hipsparseSetMatFillMode(descr, HIPSPARSE_FILL_MODE_UPPER);    // data matrix is symmetric and expanded anyway

I receive HIPSPARSE_STATUS_INVALID_VALUE and 0 bytes are allocated for the buffer.

jsandham commented 6 hours ago

I think your problem lies in how you are creating your CSCMatrix and setting the data:

CSCMatrix *init_data()
{
    CSCMatrix *A = calloc(1, sizeof(CSCMatrix));

    // symmetric expanded
    // int p[] = {0, 3, 6, 7, 10, 12, 13, 14, 16, 17, 18};
    // int i[] = {0, 3, 4, 1, 3, 7, 2, 0, 1, 3, 0, 4, 5, 6, 1, 7, 8, 9};
    // double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, 0.736161, -0.642353, -10.0, 1.189207, -10.0, -10.0, -10.0, -0.875448, -10.0, -10.0, -10.0};

    // lower triangle only

    int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
    int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
    double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};

    A->m = A->n = 10;
    A->p = p;
    A->i = i;
    A->x = x;
    return A;
}

Specifically I don't think you can set the arrays using local variables that go out of scope once init_data is finished. Hint: print the value of nnz. I would recommend the following:

// CSR matrix data on the device
double *d_vals;  // Non-zero values of the matrix (on device)
int *d_row_ptrs; // Row pointers (on device)
int *d_col_inds; // Column indices (on device)

int p[] = {0, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14};
int i[] = {0, 3, 4, 1, 3, 7, 2, 3, 4, 5, 6, 7, 8, 9};
double x[] = {1.0e-6, 0.736161, 1.189207, 1.0e-6, -0.642353, -0.875448, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0};

CSCMatrix* A = malloc(sizeof(CSCMatrix)); // I would not use malloc/calloc here at all...
A->m = A->n = 10;
int nnz = p[A->m];   // The last element of A->p gives the number of non-zeros
A->p = malloc(sizeof(int) * (A->m + 1));
A->i = malloc(sizeof(int) * nnz);
A->x = malloc(sizeof(double) * nnz);

memcpy(A->p, p, sizeof(int) * (A->m + 1));
memcpy(A->i, i, sizeof(int) * nnz);
memcpy(A->x, x, sizeof(double) * nnz);

hipMalloc((void **)&(d_vals), nnz * sizeof(double));         // Matrix values (number of non-zeros)
hipMalloc((void **)&(d_row_ptrs), (A->n + 1) * sizeof(int)); // Column pointers (size n + 1)
hipMalloc((void **)&(d_col_inds), nnz * sizeof(int));        // Row indices (number of non-zeros)

// At the end
free(A->p);
free(A->i);
free(A->x);
free(A);

Doing this allows the buffer size and analysis to pass.

The call to hipsparseDcsrsv2_solve will fail as no right hand size or solution vector has been provided.

jsandham commented 6 hours ago

Just another note. I am a little worried by:

// Perform numeric factorization (LU decomposition)
double alpha = 1.0;
status = hipsparseDcsrsv2_solve(handle, NOOP, A->m, nnz, &alpha, descr,
                                    d_vals, d_row_ptrs, d_col_inds,
                                    info_LU, NULL, NULL, NOPOLICY, buffer);

Specifically the code comment suggesting that this routine will perform an LU decomposition. Just to be clear, that is not what this routine does. It is the final step in solving the system Mx = b where M is either upper or lower triangular. It sounds like what you may be trying to achieve is to first compute the LU factorization, i.e. A=LU, and then using this factorization to solve:

Ax=b Compute A=LU resulting in system LUx=b Let y = Ux Solve Ly=b first Then solve Ux = y to get x.

If this is the case, you may want to look at https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/reference/precond.html#rocsparse-csrilu0 example. Note that this is an incomplete LU factorization with no fill-in so will not solve Ax=b exactly. Incomplete LU factorization is designed for use in preconditioners for iterative solvers.

As this example shows, you will need to make multiple calls to hipsparseDcsrsv2_solve but you will also need to first call rocsparse_dcsrilu0