Closed kvaragan closed 9 years ago
I'm checking it right now.
I think the way to multiply two sparse matrices here is way too complicated. I understand that the issue is related to the concept that user is responsible for buffer allocation. Here are my general comments
rowPtrCtSizeInBytes
and buffer_d_SizeInBytes
I think they should be hidden in spgemmInfo
clsparseSpGEMM
to clsparseSpGEMMInfo
or clsparseSpGEMMInfoHelper
not to confuse that with function name.The general explicit call for spGemm should be done at most in two separated functions. The first one calculates the necessary memory requirements for result matrix, second is doing actual spmm.
clsparseSpGEMM spgemmInfo;
clsparseStatus status = clsparse<S/D>SpmmAnalyse(&A, &B, &spgemmInfo, control);
if (status == clsparseSuccess)
{
//check the spgemmInfo additional information, i.e. if there is enough memory to
//complete spmm operation.
//allocate the C matrix according to spgemminfo content;
//do the spmm
status = clsparse<S/D>Spmm(&A, &B, &C, &spgemmInfo, control);
}
clsparseReleaseSpGEMMInfo(&spgemmInfo);
I see that it might be difficult at the moment, but I would like to see that every iteration of this algorithm bring us to this kind of concept.
Here some more details inside of clsparse-spm-spm.cpp:
`clsparse::vector
here and in any other places where you allocating cl_mem buffer instead. Here as well statistics
funcion for various int types.int nnzC = computeCsrNNZ(...)
whould be more convinientI did not go trough the internal spmm functions deeply for now, but I saw that they need to be templetized for various types, the kernels should be feeded with additional definitions group size, tuple_queue etc. In the final optimisation stages several kernels like compute_nnz_Ct looks like spmv traversal so they might be optimised for vectorized versions.
Thanks for the feedback @jpola . I agree with all comments and your suggestions can be incorporated, but before I do that should we go ahead with this design. Basic principle of this design is, to separate memory allocations from computation and user should have control for buffer allocation.
@kvaragan I approve of your goal here, but I have to agree with Jakub that this interface is not finished yet. If I have to ask users to change their SpM-SpM interface 1 time before v1.0, I would rather ask them to change from using this API:
CLSPARSE_EXPORT clsparseStatus
clsparseScsrSpGemm(
const clsparseCsrMatrix* sparseMatA,
const clsparseCsrMatrix* sparseMatB,
clsparseCsrMatrix* sparseMatC,
const clsparseControl control );
than to change from these API's
clstatus = clsparseSpMSpM_CreateInit(&A, &B, control,
&spgemmInfo, // output
&rowPtrCtSizeInBytes, // output
&buffer_d_SizeInBytes); // output
clstatus = clsparseSpMSpM_ComputennzCtExt(csrRowPtrCt_d,
rowPtrCtSizeInBytes,
sizeof(cl_int),
buffer_d,
buffer_d_SizeInBytes,
&nnzCt,
spgemmInfo,
control);
clstatus = clsparseSpMSpM_ScsrSpmSpmnnzC(C.rowOffsets, // ( m+1) * sizeof(int) - memory allocated
csrRowPtrCt_d, // ( m+1) * sizeof(int) - memory allocated
&csrColIndCt_d, // nnzCt * sizeof(int)
&csrValCt_d, // nnzCt * sizeof(float) - Single Precision
&nnzC, // nnz of Output matrix
spgemmInfo,
control);
clstatus = clsparseSpMSpM_FillScsrOutput(C.rowOffsets,
C.colIndices,
C.values,
csrRowPtrCt_d,
csrColIndCt_d,
csrValCt_d,
nnzC,
spgemmInfo,
control);
I agree that currently the SpM-SpM is allocating a lot of internal memory, but the user experience of the original API is more in line with what we want present to our users. I have not combed through the implementation files, but I trust that Jakub has given sufficient feedback for this iteration.
Here's your challenge: can you design a working implementation with an API like what Jakub mocked above? 1 helper clsparse<S/D>SpmmAnalyse()
, 1 worker clsparse<S/D>Spmm()
@jpola
Will it be possible to just typedef clsparse::vector
to the boost.compute vector? Same with array?
I wish but I think it would not be possible. I'll take a closer look on that tomorrow.
2015-09-29 23:37 GMT+02:00 Kent Knox notifications@github.com:
@jpola https://github.com/jpola Will it be possible to just typedef clsparse::vector to the boost.compute vector? Same with array?
— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clSPARSE/issues/158#issuecomment-144198544 .
@jpola using clsparse::vector is same as using clCreateBuffer(),because internally that's what clsparse::vector is doing. If I use clsparse::vector then it will defeat the purpose of this design.
@kknox Because of the nature of the algorithm, 2 APIs are not enough but if we use clsparse::vector may be it can be done, but that means memory allocations inside the library and that will be as good as our first implementation.
So can I conclude we that will be going with the 1 API design which is already present in the develop branch?
@kvaragan of course the clsparse::vector is built over the opencl buffer. It was just my suggestion that I would use it because it have more benefits among many I think those are the most important:
@jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations.
If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector.
Sorry Kiran, I might misunderstood you. The concept I mentioned before is applicable only for internal code of our library. My answer was referring to the point 5. from my list of feedbacks. Instead of cl_mem I wish to use clsparse::vector.
When it comes to interface functions from clSPARSE.h header we are passing cl_mem due to fact that we want that clSPARSE is having C style interface. Library then can be easily adopted to other programming languages like python or fortran.
However when you are implementing interface functions you can put a clsparse::vector interface temporary for input and output parameters with this constructor:
//create vector from preinitialized buffer. vector (clsparseControl control, const cl::Buffer& buffer, size_t size) : BASE::_buff(buffer), BASE::_size(size), queue(control->queue)
You may see the https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/solvers/conjugate-gradients.hpp to check how it's done.
Concluding, I wanted to suggest that when you are in the definition of the interface functions or in any other internal clSPARSE function use clsparse::vector or some more use-friendly datatype than cl_mem or put the clsparse::vector interface on incomming function parameters of type cl_mem.
Kuba.
2015-09-30 10:42 GMT+02:00 Kiran notifications@github.com:
@jpola https://github.com/jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations.
If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector.
— Reply to this email directly or view it on GitHub https://github.com/clMathLibraries/clSPARSE/issues/158#issuecomment-144329072 .
Understood, I missed the point C-style interface. So we will stick to the 1 API design + your suggestions about memory allocations (clsparse::vector). Thanks for clarifying.
@kvaragan I looks like @jpola answered the majority of your questions, but to close this out I think we need to ship Beta2 with the 1 API version. It's not optimal, but its easy to document and use. I think a good approach to use is to try to simplify the guts of the SpM-SpM routine first (starting with the recommendations above), and once the current codepaths are simpler (possibly leveraging boost.compute) it may be easier to refactor the memory allocations in a cleaner way.
In the ideal world (I like Jakub's proposed names), the Analyze function would 'simulate' what would happen end to end in the SpM-SpM call, and can record/store all required temp buffers needed for the algorithm. The user would then be free to allocate those buffers as they see fit and put the buffer handles into the spmminfo structure to be passed into the spmm compute routine.
Everything included in or included by clsparse.h should be extern C
. We can use C++ inside the library and user can use c++ in the application. Users are able to use boost.compute on their own if they wish, but they need to pass handles into the library. As a future enhancement request, a c++ wrapper could be provided for the library to provide convenience objects and functions.
@kknox I am closing this issue, with conclusion for Beta 2 release we will go with 1 API version. I will add rest of the suggestions to future enhancements #152. Thank You for the feedback
Single API is now split into multiple API's in order to allow the application to allocate memory rather than the library. I have written the sample code on how to use this API. Is this the better-way ? See the code @ sample file : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/samples/sample-spmspm.cpp
refactor code: https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/library/blas3/clsparse-spm-spm.cpp
API's : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/include/clSPARSE.h