CRS Arrays of Sparse Matrices

JabirSS commented 5 years ago

Dear @martinjrobins, Thank you for your continuous development and support of Aboria.

I have been solving large systems of equations using Aboria and recently I found myself needing to use a different linear solver than Eigen since it's missing the Algebraic MultiGrid method which is a lot faster for my problem. To do this, I created the sparse operator then used Eigen to assemble the matrix and obtain the compressed sparse row arrays I needed to pass to the external linear solver library (amgcl). However, I noticed that the assemble command is quite slow (2x slower than the solution of the linear system) and runs in serial.

I would really appreciate any suggestions you may have to improve this. Currently is there any other way to get the CRS arrays directly from Aboria without resorting to Eigen?

martinjrobins commented 5 years ago

which assemble function are you using? There are two:

 template <typename MatrixType> void assemble(const MatrixType &matrix) const {

and

 template <typename Triplet>
  void assemble(std::vector<Triplet> &triplets, const size_t startI = 0,
                const size_t startJ = 0) const {

The 2nd returns a triplets vector, as needed to construct a sparse matrix in eigen. The 1st assumes you are passing in a dense eigen matrix. You could probably adapt the 2nd to instead return the CRS format required by amgcl?

JabirSS commented 5 years ago

I am using the first assemble function. Eigen actually does a pretty good job in generating the CRS arrays, so I can live with it but the assemble function runs in serial and is currently the slowest part of the entire routine. Adding the #pragma omp parallel for didn't seem to help with either the first or 2nd function. Is it possible to get them to work in parallel?

martinjrobins commented 5 years ago

The 1st function is only for dense matrices, I'm not surprised if it is slow for sparse matrices. Use the 2nd assemble function, and then Eigen setFromTriplets function (see https://eigen.tuxfamily.org/dox/group__TutorialSparse.html) to create an Eigen sparse matrix, then get the CRS arrays from that.

On Tue, 8 Jan 2019 at 10:15, JabirSS notifications@github.com wrote:

I am using the first assemble function. Eigen actually does a pretty good job in generating the CRS arrays, so I can live with it but the assemble function runs in serial and is currently the slowest part of the entire routine. Adding the #pragma omp parallel for didn't seem to help with either the first or 2nd function. Is it possible to get them to work in parallel?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/martinjrobins/Aboria/issues/44#issuecomment-452245593, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGF9EN6yEeo7RhrN3rfAMSGhCRu3AiWks5vBG-ngaJpZM4Z00w8 .

JabirSS commented 5 years ago

Sorry but I am unable to get the second function to compile. Here's what I did:

auto myOperator = create_sparse_operator(nodes, nodes, radius, function);
Eigen::SparseMatrix<double> Matrix(N, N);

typedef Eigen::Triplet<double> T;
std::vector<T> tripletList;
tripletList.reserve(estimate);

myOperator.assemble(tripletList);
Matrix.setFromTriplets(tripletList.begin(), tripletList.end());

The compiler throws the following error for the assemble line:

template argument deduction/substitution failed:
‘std::vector<Eigen::Triplet<double> >’ is not derived from ‘Eigen::SparseMatrix<double, _Options, _StorageIndex>’

seems like it's still calling the wrong function?

ghost commented 5 years ago

Sorry, I was looking at the code for the Kernel class, rather than the MatrixReplacement class. You might actually have been using setFromTriplets previously since the MatrixReplacement uses setFromTriplets if it's given a SparseMatrix, but this is still useful to try and see where the time cost comes from.

try this:

myOperator.get_first_kernel().assemble(tripletList);

martinjrobins commented 5 years ago

^^^^^^^^^^^ that was me (not @pints-bot), wrong log-in!

JabirSS commented 5 years ago

Unfortunately myOperator.get_first_kernel().assemble(tripletList); gives the same result in terms of performance as it also seems to run in serial.

martinjrobins commented 5 years ago

yes, this runs in serial. Its tricky, as unless you know how many non-zero values there are going to be in each row of the sparse matrix, it is going to be difficult to construct it in parallel. Do you absolutly need to construct the matrix? Is there any way you can use your multigrid solver in a matrix-free fashion?

martinjrobins commented 5 years ago

what you could do construct each row of the sparse matrix (i.e. a sequence of triplets) in parallel, and then have a serial portion at the end that concatenates them? If you could rewrite the 2nd function above to do this, then I'd (very) happily merge it into Aboria!

JabirSS commented 5 years ago

Alright, I'll give it a shot and let you know!

On Wed, Jan 9, 2019, 21:43 Martin Robinson <notifications@github.com wrote:

what you could do construct each row of the sparse matrix (i.e. a sequence of triplets) in parallel, and then have a serial portion at the end that concatenates them? If you could rewrite the 2nd function above to do this, then I'd (very) happily merge it into Aboria!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/martinjrobins/Aboria/issues/44#issuecomment-452683750, or mute the thread https://github.com/notifications/unsubscribe-auth/Aku2sIjHwYHGFvwgfDXblC0XDNcge8Qbks5vBeQGgaJpZM4Z00w8 .

JabirSS commented 5 years ago

Unfortunately the library I'm using doesn't support matrix free operations yet. Honestly I couldn't figure out how the function in question works exactly, so I wrote one specific to my needs. It takes a reference to the array of triplets and fills in the matrix as follows:


void createMatrix(std::vector<Eigen::Triplet<double>> &tripletList)
{
    typedef Eigen::Triplet<double> T;
    std::vector<std::vector<T>> rowVals(nodes.size());
    double result;
    int nnz=0;  //number of non-zero elements
    int element_i;
#pragma omp parallel for private(element_i,result)
    for (element_i=0; element_i < nodes.size(); ++element_i) {
        auto i = container[element_i];
        for (auto j = euclidean_search(container.get_query(), get<position>(i), radius);
                j != false; ++j)
        {
            result =0;
            int element_j= &get<position>(*j) - get<position>(nodes).data();
                        result = someFunction(i,j)
            rowVals[element_i].push_back(T(element_i, element_j, result));
                        nnz++;
        }
    }
    tripletList.reserve(nnz);
    for(int row=0; row<nodes.size();++row)
        for(int col=0; col<rowVals[n].size();++col)
            tripletList.push_back(rowVals[row][col]);
}

I'm not sure if this is useful to you at all, since it is way too specific when compared to the function in Kernels.h but I thought you might be interested. This function however runs about 3x faster than the original function, I'm sure better performance could be achieved if the last part where the triplets are concatenated is also parallized, but I found its cost negligible when compared to the rest of the routine.

martinjrobins commented 5 years ago

Great you were able to speed it up. This is useful, and along the lines of what I was thinking. Before putting it into Aboria, it would have to be able to handle matrix-valued kernels (i.e. a kernel function returning a matrix rather than a scalar). If you wish to do this, you are more than welcome of course (I can explain more about how this works), but I'd understand if you wish to move on with your own simulation!

JabirSS commented 5 years ago

Thank you, it would be great if you could explain the matrix-valued kernels or refer me to some reading materials, and I'll more than gladly write it down for addition to Aboria. Also, how should one give credit to Aboria in a publication? Would citing the SoftwareX paper do?

martinjrobins commented 5 years ago

Unfortunately this isn't really documented anywhere. I was originally using the matrix-valued kernels for Boundary Element Simulations of stokes flow. This is a kernel based method that uses greens functions for the kernel, in this case the greens function for stokes flow, which returns a matrix of size 2x2, where 2 in this case is the number of spatial dimensions

Below is the current code for the assemble function that returns a vector of Triplets. It also takes two arguments, startI and startJ, there are the starting indicies of the matrix block that this kernel refers to (see https://martinjrobins.github.io/Aboria/aboria/evaluating_and_solving_kernel_op.html#aboria.evaluating_and_solving_kernel_op.block_operators)

The KernelSparse class defines a type Block (unrelated to the block operators that I mentioned in the previous paragraph), which is the type that is returned by the kernel function (i.e. the type returned by m_dx_function(dx, ai, bj)). For my stokes simulations Block is set to Eigen::Matrix<2,2,double>, for your simulation Block would simply be double. The class also has two constants BlockRows and BlockCols, which would be 2 and 2 for my stokes simulation, and 1 and 1 for yours.

To assemble the matrix, the code does an outer loop down the rows:

for (size_t i = 0; i < na; ++i) {

an inner loop down each non-zero block in that row

 for (auto pairj =
               euclidean_search(b.get_query(), get<position>(ai), radius);
           pairj != false; ++pairj) {

and then an inner-inner double loop for each element of that block

 for (size_t ii = 0; ii < BlockRows; ++ii) {
          for (size_t jj = 0; jj < BlockCols; ++jj) {

Does this make sense? You just have to use i, j, ii, jj, startI, and startJ to calculate the particular i,j index of each triplet added, which would be (i BlockRows + ii + startI, j BlockCols + jj + startJ)

template <typename Triplet>
void assemble(std::vector<Triplet> &triplets, const size_t startI = 0, const size_t startJ = 0) const {

    const RowElements &a = this->m_row_elements;
    const ColElements &b = this->m_col_elements;

    const size_t na = a.size();

    // sparse a x b block
    // std::cout << "sparse a x b block" << std::endl;
    for (size_t i = 0; i < na; ++i) {
      const_row_reference ai = a[i];
      const double radius = m_radius_function(ai);
      for (auto pairj =
               euclidean_search(b.get_query(), get<position>(ai), radius);
           pairj != false; ++pairj) {
        const_col_reference bj = *pairj;
        const_position_reference dx = pairj.dx();
        const size_t j = &get<position>(bj) - get<position>(b).data();
        const Block element = static_cast<Block>(m_dx_function(dx, ai, bj));
        for (size_t ii = 0; ii < BlockRows; ++ii) {
          for (size_t jj = 0; jj < BlockCols; ++jj) {
            triplets.push_back(Triplet(i * BlockRows + ii + startI,
                                       j * BlockCols + jj + startJ,
                                       element(ii, jj)));
          }
        }
      }
    }
  }

martinjrobins commented 5 years ago

Also, how should one give credit to Aboria in a publication? Would citing the SoftwareX paper do?

Yup, citing the SoftwareX paper would do it, thanks! I should put this in a CITATION.md file or something... If you want to send me the details of the publication (when its published) then I could start a list of publications using Aboria as well.

JabirSS commented 5 years ago

Thank you for the explanation. Yes, that makes perfect sense! I'll let you know how it goes. The papers are either still under review or being written at the moment, but I will forward the details to you once they're approved and available online.

aboria / Aboria

CRS Arrays of Sparse Matrices #44