jpola commented 9 years ago

There is a problem with reading symmetric matrices with clsparseCsrMatrixformFile function. There are two distinct places where issue appears.

Calculating rowBlocks for adaptive spmv
running adaptive spmv itself.

Group of matrices revealing this issue:

Reuters911: rowBlocks calculations issue
hydri1c: pattern symmetrix matrix hydr1c_Zeros_73.mtx adaptive spmv hangs Linux system on Windows giving wrong results.

jpola commented 9 years ago

hydr1c_Zeros_73.mtx is actually coordinate pattern matrix. Problem might be ralated to calculating rowBlocks. If the system survive the call it is showing following error:

Matrix: ... Small/hydr1c/hydr1c_Zeros_73.mtx [nRow: 5308] [nCol: 5308] [nNZ: 382]
... /clSPARSE/src/library/internal/computeRowBlocks.hpp:98: 
void ComputeRowBlocks(rowBlockType*, size_t&, const int*, int, int) [with rowBlockType = long unsigned int; size_t = long unsigned int]: 
Assertion `dist <= rowBlockSize' failed.
Aborted

jlgreathouse commented 9 years ago

The issue here is a series of problems in reading matrices with many empty rows inside and matrices with empty rows at the end. A secondary problem in rowBlock generation comes up after fixing this -- it failed for tiny matrices that fit in a single workgroup.

The first problem is that the matrix market reader fails when it runs across multiple consecutive rows with no entries. See lines 587 and 655 in mm-reader.cpp. The if statement should be changed to a while statement so that we successfully skip past all empty rows.

In addition, if there are many empty rows at the end of the matrix, we need to make sure to fill in the last iCsrRowOffsets values to the same NNZ value in order to make sure we don't think there are billions of non-zero values in the last rows.

Patch for this problem:

diff --git a/src/library/io/mm-reader.cpp b/src/library/io/mm-reader.cpp
index 14750e0..c6ddeff 100644
--- a/src/library/io/mm-reader.cpp
+++ b/src/library/io/mm-reader.cpp
@@ -584,10 +584,15 @@ clsparseSCsrMatrixfromFile(clsparseCsrMatrix* csrMatx, const char* filePath, cls
         iCsrColIndices[ i ] = coords[ i ].y;
         fCsrValues[ i ] = coords[ i ].val;

-        if( coords[ i ].x >= current_row )
+        while( coords[ i ].x >= current_row )
             iCsrRowOffsets[ current_row++ ] = i;
     }
     iCsrRowOffsets[ current_row ] = pCsrMatx->num_nonzeros;
+    while(current_row <= pCsrMatx->num_rows)
+    {
+        iCsrRowOffsets[ current_row++ ] = pCsrMatx->num_nonzeros;
+    }
+

     // Compute the csr matrix meta data and fill in buffers
     if( pCsrMatx->rowBlockSize )
@@ -652,10 +657,14 @@ clsparseDCsrMatrixfromFile(clsparseCsrMatrix* csrMatx, const char* filePath, cls
         iCsrColIndices[ i ] = coords[ i ].y;
         fCsrValues[ i ] = coords[ i ].val;

-        if( coords[ i ].x >= current_row )
+        while( coords[ i ].x >= current_row )
             iCsrRowOffsets[ current_row++ ] = i;
     }
     iCsrRowOffsets[ current_row ] = pCsrMatx->num_nonzeros;
+    while(current_row <= pCsrMatx->num_rows)
+    {
+        iCsrRowOffsets[ current_row++ ] = pCsrMatx->num_nonzeros;
+    }

     // Compute the csr matrix meta data and fill in buffers
     if( pCsrMatx->rowBlockSize )

After that, the hydr1c matrix runs into an issue that it has very few NNZ. So few, in fact, that we only need one workgroup to compute the result. The rowBlockSize was therefore inaccurately calculated to be 1, when in must be 2 (because we need a start_rowBlock and end_rowBlock entry).

Patch for that issue:

diff --git a/src/library/transform/clsparse-coo2csr.cpp b/src/library/transform/clsparse-coo2csr.cpp
index f6d2613..48f2d5b 100644
--- a/src/library/transform/clsparse-coo2csr.cpp
+++ b/src/library/transform/clsparse-coo2csr.cpp
@@ -61,7 +61,7 @@ clsparseCsrMetaSize( clsparseCsrMatrix* csrMatx, clsparseControl control )

     // This allocates up front the maximum size of rowBlocks at start; likely not all the memory is used but
     // this is the fastest
-    pCsrMatx->rowBlockSize = 3 * ( pCsrMatx->num_nonzeros / BLKSIZE ) + 1;
+    pCsrMatx->rowBlockSize = 3 * ( pCsrMatx->num_nonzeros / BLKSIZE ) + 2;

     return clsparseSuccess;
 }

jlgreathouse commented 9 years ago

Note that I haven't committed any patches, just posted them here. After fixing this issue, the GPU no longer locks up for me, but the results won't validate using the test-blas2 program. This appears to be because of the way that uBLAS is being used in csr_matrix_environment.h. uBLAS complains on an internal error for hydr1c_Zeros_73.mtx:

Check failed in file /home/jgreatho/clsparse/bin/clSPARSE/makeDebug/Externals/Boost/package/include/boost/numeric/ublas/matrix_sparse.hpp at line 3942:
layout_type::index_m (itv_ - (*this) ().index1_data_.begin (), (*this) ().zero_based (*it_)) < (*this) ().size2 ()
unknown file: Failure
C++ exception with description "bad index" thrown in the test body.

Half an hour to find and fix the GPU problem, 6 hours of beating my head against uBLAS. I gave up. :)

I'll also note that the uBLAS::blas_2::gmv() call in the test-blas2 application is about 10,000x slower than running on the GPU (so a few thousand times slower than CPU-based CSR SpMV written by hand). I don't know what's going wrong with it on my system, but it's unbearably slow to test decent-sized matrices.

Thoughts on just hauling taking the raw pointers from the compressed_matrix and doing the SpMV by hand?

jpola commented 9 years ago

Hi, unfortunately according to my tests your patches does not fix the issue. I took the hydr1c_Zeros_73.mtx matrix and executed the blas2 test. On Linux platform execution of spmv test is still crushing my system and I have to reset the PC by pressing reset button. The last function I'm able to reach is clsparseSCsrMatrixfromFile. Once every ten tries system does not hang but then the assertion is thrown from ComputeRowBlocks function

 size_t dist = std::distance( rowBlocksBase, rowBlocks );
 assert( dist <= rowBlockSize );

On Windows platform when running test-blas2 on this matrix I'm obtaining error (-30 INVALID_VALUE) at this point

// copy the values in double precision to double precision matrix container
        copy_status = clEnqueueWriteBuffer( queue, csrDMatrix.values, CL_TRUE, 0,
                                            ublasDCsr.value_data().size( ) * sizeof( cl_double ),
                                            ublasDCsr.value_data().begin( ),
                                            0, NULL, NULL);

        if( copy_status )
        {
            TearDown( );
            exit( -5 );
        }

jlgreathouse commented 9 years ago

I believe the first problem (dist <= rowBlockSize) is solved by the second patch I posted. That assert is hit because rowBlockSize is incorrectly set to 1, while you actually need two row blocks.

The second problem is why I didn't commit a patch. I strongly believe this problem is in csr_matrix_environment.h:CSREnvironment(). Reading directly into the GPU's memory, copying that back out into a uBLAS container, then copying that into a double precision uBLAS container, then copying that into another GPU buffer -- I think there are problems in this flow.

I've tested the MM reading function and SpMV kernel on a separately built user-side application (the one we use in Research), and I see no lockups. I'll try making a more straightforward test app as a sanity check.

jpola commented 9 years ago

clMathLibraries / clSPARSE

clsparseCsrMatrixfromFile - symmetric matrix #84

105 fixes the issue