clMathLibraries / clSPARSE

a software library containing Sparse functions written in OpenCL
Apache License 2.0
173 stars 61 forks source link

Feedback Needed - Refactored SpMxSpM #158

Closed kvaragan closed 9 years ago

kvaragan commented 9 years ago

Single API is now split into multiple API's in order to allow the application to allocate memory rather than the library. I have written the sample code on how to use this API. Is this the better-way ? See the code @ sample file : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/samples/sample-spmspm.cpp

refactor code: https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/library/blas3/clsparse-spm-spm.cpp

API's : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/include/clSPARSE.h

jpola commented 9 years ago

I'm checking it right now.

jpola commented 9 years ago

I think the way to multiply two sparse matrices here is way too complicated. I understand that the issue is related to the concept that user is responsible for buffer allocation. Here are my general comments

  1. Users should not see the rowPtrCtSizeInBytes and buffer_d_SizeInBytes I think they should be hidden in spgemmInfo
  2. I would rename clsparseSpGEMM to clsparseSpGEMMInfo or clsparseSpGEMMInfoHelper not to confuse that with function name.
  3. I can't say that I like the names of the explicit functions like here

The general explicit call for spGemm should be done at most in two separated functions. The first one calculates the necessary memory requirements for result matrix, second is doing actual spmm.

clsparseSpGEMM spgemmInfo;

clsparseStatus status = clsparse<S/D>SpmmAnalyse(&A, &B, &spgemmInfo, control);

if (status == clsparseSuccess)
{
//check the spgemmInfo additional information, i.e. if there is enough memory to
//complete spmm operation.
//allocate the C matrix according to spgemminfo content;

//do the spmm
status = clsparse<S/D>Spmm(&A, &B, &C, &spgemmInfo, control);
}

clsparseReleaseSpGEMMInfo(&spgemmInfo);

I see that it might be difficult at the moment, but I would like to see that every iteration of this algorithm bring us to this kind of concept.

Here some more details inside of clsparse-spm-spm.cpp:

  1. This defines as previously should be defined in the places where they are used.
  2. I would rename those and those variables to Anum_rows, Anum_cols, etc. for the sake of clarity, it will be easier to maintain the code later.
  3. This kind of interface is unacceptable in my opinion. We will be flooded by issues from the users in cases when they will put something wrong there.
  4. In several places like here you were using explicit scan. I would create an internal function for that.
  5. I would use `clsparse::vector here and in any other places where you allocating cl_mem buffer instead. Here as well
  6. Why dont pass ref to std::vector instead of raw pointers here. In addition templetize the statistics funcion for various int types.
  7. Another scan. Maybe int nnzC = computeCsrNNZ(...) whould be more convinient

I did not go trough the internal spmm functions deeply for now, but I saw that they need to be templetized for various types, the kernels should be feeded with additional definitions group size, tuple_queue etc. In the final optimisation stages several kernels like compute_nnz_Ct looks like spmv traversal so they might be optimised for vectorized versions.

kvaragan commented 9 years ago

Thanks for the feedback @jpola . I agree with all comments and your suggestions can be incorporated, but before I do that should we go ahead with this design. Basic principle of this design is, to separate memory allocations from computation and user should have control for buffer allocation.

kknox commented 9 years ago

@kvaragan I approve of your goal here, but I have to agree with Jakub that this interface is not finished yet. If I have to ask users to change their SpM-SpM interface 1 time before v1.0, I would rather ask them to change from using this API:

 CLSPARSE_EXPORT clsparseStatus
        clsparseScsrSpGemm(
        const clsparseCsrMatrix* sparseMatA,
        const clsparseCsrMatrix* sparseMatB,
              clsparseCsrMatrix* sparseMatC,
        const clsparseControl control );

than to change from these API's

 clstatus = clsparseSpMSpM_CreateInit(&A, &B, control, 
                                         &spgemmInfo,            // output 
                                         &rowPtrCtSizeInBytes,   // output 
                                         &buffer_d_SizeInBytes); // output 

clstatus = clsparseSpMSpM_ComputennzCtExt(csrRowPtrCt_d, 
                                               rowPtrCtSizeInBytes, 
                                               sizeof(cl_int), 
                                               buffer_d, 
                                               buffer_d_SizeInBytes, 
                                               &nnzCt, 
                                               spgemmInfo, 
                                               control); 

clstatus = clsparseSpMSpM_ScsrSpmSpmnnzC(C.rowOffsets,          // ( m+1) * sizeof(int) - memory allocated 
                                              csrRowPtrCt_d,         // ( m+1) * sizeof(int) - memory allocated 
                                              &csrColIndCt_d,        // nnzCt * sizeof(int) 
                                              &csrValCt_d,           // nnzCt * sizeof(float) - Single Precision 
                                              &nnzC,                    // nnz of Output matrix 
                                              spgemmInfo, 
                                              control); 

clstatus = clsparseSpMSpM_FillScsrOutput(C.rowOffsets, 
                                              C.colIndices, 
                                              C.values, 
                                              csrRowPtrCt_d, 
                                              csrColIndCt_d, 
                                              csrValCt_d, 
                                              nnzC,  
                                              spgemmInfo, 
                                              control); 

I agree that currently the SpM-SpM is allocating a lot of internal memory, but the user experience of the original API is more in line with what we want present to our users. I have not combed through the implementation files, but I trust that Jakub has given sufficient feedback for this iteration.

Here's your challenge: can you design a working implementation with an API like what Jakub mocked above? 1 helper clsparse<S/D>SpmmAnalyse(), 1 worker clsparse<S/D>Spmm()

kknox commented 9 years ago

@jpola Will it be possible to just typedef clsparse::vector to the boost.compute vector? Same with array?

jpola commented 9 years ago

I wish but I think it would not be possible. I'll take a closer look on that tomorrow.

2015-09-29 23:37 GMT+02:00 Kent Knox notifications@github.com:

@jpola https://github.com/jpola Will it be possible to just typedef clsparse::vector to the boost.compute vector? Same with array?

— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clSPARSE/issues/158#issuecomment-144198544 .

kvaragan commented 9 years ago

@jpola using clsparse::vector is same as using clCreateBuffer(),because internally that's what clsparse::vector is doing. If I use clsparse::vector then it will defeat the purpose of this design.

kvaragan commented 9 years ago

@kknox Because of the nature of the algorithm, 2 APIs are not enough but if we use clsparse::vector may be it can be done, but that means memory allocations inside the library and that will be as good as our first implementation.

So can I conclude we that will be going with the 1 API design which is already present in the develop branch?

jpola commented 9 years ago

@kvaragan of course the clsparse::vector is built over the opencl buffer. It was just my suggestion that I would use it because it have more benefits among many I think those are the most important:

  1. Code is more readible which means that it is easier to understand and maintain.
  2. clsparse::vector manages the memory for you. You can avoid potential memory leaks. Even if i saw that the code seems not to have any we can't be sure that in the future we do not make any mistake. The use of clsparse::vector or array will minimise the issues related to the code by enforcing future developers to adopt to the present style.
  3. In the near future it seems that we are moving to the boost::compute. It will indicate where the old clsparse::vector might be substituted for more advanced / user friendly data structure.
  4. vector know its size, We will be able to simplify the code greatly by using proper constructor of std::vector, compute::vector and compute::copy functions to eliminate clEnqueueWrite/Read/Copy functions.
kvaragan commented 9 years ago

@jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations.

If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector.

jpola commented 9 years ago

Sorry Kiran, I might misunderstood you. The concept I mentioned before is applicable only for internal code of our library. My answer was referring to the point 5. from my list of feedbacks. Instead of cl_mem I wish to use clsparse::vector.

When it comes to interface functions from clSPARSE.h header we are passing cl_mem due to fact that we want that clSPARSE is having C style interface. Library then can be easily adopted to other programming languages like python or fortran.

However when you are implementing interface functions you can put a clsparse::vector interface temporary for input and output parameters with this constructor:

//create vector from preinitialized buffer. vector (clsparseControl control, const cl::Buffer& buffer, size_t size) : BASE::_buff(buffer), BASE::_size(size), queue(control->queue)

You may see the https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/solvers/conjugate-gradients.hpp to check how it's done.

Concluding, I wanted to suggest that when you are in the definition of the interface functions or in any other internal clSPARSE function use clsparse::vector or some more use-friendly datatype than cl_mem or put the clsparse::vector interface on incomming function parameters of type cl_mem.

Kuba.

2015-09-30 10:42 GMT+02:00 Kiran notifications@github.com:

@jpola https://github.com/jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations.

If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector.

— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clSPARSE/issues/158#issuecomment-144329072 .

kvaragan commented 9 years ago

Understood, I missed the point C-style interface. So we will stick to the 1 API design + your suggestions about memory allocations (clsparse::vector). Thanks for clarifying.

kknox commented 9 years ago

@kvaragan I looks like @jpola answered the majority of your questions, but to close this out I think we need to ship Beta2 with the 1 API version. It's not optimal, but its easy to document and use. I think a good approach to use is to try to simplify the guts of the SpM-SpM routine first (starting with the recommendations above), and once the current codepaths are simpler (possibly leveraging boost.compute) it may be easier to refactor the memory allocations in a cleaner way.

In the ideal world (I like Jakub's proposed names), the Analyze function would 'simulate' what would happen end to end in the SpM-SpM call, and can record/store all required temp buffers needed for the algorithm. The user would then be free to allocate those buffers as they see fit and put the buffer handles into the spmminfo structure to be passed into the spmm compute routine.

Everything included in or included by clsparse.h should be extern C. We can use C++ inside the library and user can use c++ in the application. Users are able to use boost.compute on their own if they wish, but they need to pass handles into the library. As a future enhancement request, a c++ wrapper could be provided for the library to provide convenience objects and functions.

kvaragan commented 9 years ago

@kknox I am closing this issue, with conclusion for Beta 2 release we will go with 1 API version. I will add rest of the suggestions to future enhancements #152. Thank You for the feedback