Closed GoogleCodeExporter closed 8 years ago
Thanks for the report.
Yes, this is a known problem with the current implementation and we intend to
fix it for Cusp v0.2.
In the context of smoothed aggregation it is necessary to compute the sparse
matrix-matrix products explicitly since we must have a sparse matrix
representation of the coarse level operator A_coarse = R * A * P. It's
possible that some reformulation of the operation is cheaper though.
Original comment by wnbell
on 11 Oct 2010 at 10:16
Thanks for the reply!
Right, one cannot avoid matrix-matrix multiplication in a smoothed aggregation
algorithm. Though, for its non-smoothed counterpart, there could be a way to
compute P^T * A * P in a more efficient manner than a general matrix-matrix
product. I mean, one can take advantage of a specific sparsity pattern of P.
When P is a tentative prolongator, P^T * A * P can be easily computed through
aggregation of corresponding entries of the matrix A.
But, I know, a non-smoothed approach may degrade the convergence rate of MG.
Anyway, looking forward to v0.2! And, of cause, a multi-GPU version could solve
the memory issue, but a very long way to go, I think.
Original comment by agnonc...@gmail.com
on 13 Oct 2010 at 2:02
Revision 7244fa8d6c improves this problem. If you have a GPU with less than 1
GB of memory you can reduce this constant [1] from 16M to something like 2M or
4M to conserve memory.
As you say, you can cheat a little when P is unsmoothed. This routine uses a
more efficient matrix-matrix multiplication scheme for the (unsmoothed)
tentative prolongator [2].
[1]
http://code.google.com/p/cusp-library/source/browse/cusp/detail/device/spmm/coo.
h#208
[2]
http://code.google.com/p/cusp-library/source/browse/cusp/precond/detail/smoothed
_aggregation.inl#133
Original comment by wnbell
on 9 Nov 2010 at 6:14
Yep! Thanks for the improvement. I've tested the latest version of cusp on my
matrix which has 8M nonzeros and it worked just fine even with 16М of
workspace. A bigger (12M nonzeros) matrix required the workspace to be reduced
to 8M. Decreasing the amount of workspace slightly degrades the spmm
performance - not a big deal from my point of view.
Just for your info, I couldn't compile the fresh ritz routines with VS8 (see
the attached log-file). Switching to the old estimate_spectral_radius resolved
the issue.
Original comment by agnonc...@gmail.com
on 11 Nov 2010 at 4:35
Attachments:
This issue was closed by revision 8f5fdb8195.
Original comment by wnbell
on 14 Feb 2011 at 8:31
Original issue reported on code.google.com by
agnonc...@gmail.com
on 11 Oct 2010 at 9:22