Closed jdumas closed 8 years ago
It should be possible, but the best documentation right now is looking at an existing implementation of the component you want to introduce.
To create a relaxation scheme, you can look at a damped_jacobi implementation. A relaxation should be a class/struct with the Backend template parameter. Its constructor should receive a matrix on the current level (in the builtin::crs<> format), its own parameters, and the backend parameters. The constructor should create any structures it needs to work inside the selected Backend. Then, it should provide two methods: apply_pre()
and apply_post()
for doing pre- and post-smoothing respectedly.
As for a coarsening, it should be a plain class/struct, that again defines its own parameters. Its constructor takes the current system matrix and returns a tuple of prolongation and restriction operators (all the matrices are in builtin::crs format). For an examples, take a look at coarsening::aggregation or coarsening::rege_stuben.
Please do not hesitate to ask any questions you have regarding the implementation, I'll be glad to help.
Thanks for the answer! I'll take another look at the implementation then.
My only concern right now however, is whether I can integrate my new class to the runtime::make_solver
paradigm easily? It would be straightforward to just copy amgcl/runtime.hpp
into my program, and append my own class where needed. But this is not a satisfying long-term solution. Maybe you can think of a way to get around this hindrance?
There is very little magic inside the make_solver
class. It just combines a preconditioner and an iterative solver together. Preconditioner is anything that has .apply(rhs, x)
method (and a .system_matrix()
that returns the matrix it was constructed for), and a solver is a functor that takes preconditioner, rhs and x, and returns a tuple of iterations made and an achieved residual.
Have a look at https://github.com/ddemidov/amgcl/blob/master/amgcl/relaxation/as_preconditioner.hpp that wraps a smoother to make a preconditioner.
make_solver
just calls the solver while passing the preconditioner to it: https://github.com/ddemidov/amgcl/blob/master/amgcl/make_solver.hpp#L141-L152.
Does that answer you question?
I think it does! What I want to do is be able to switch at runtime between existing schemes and my user-defined class. I find it very convenient that you can pass a json file to the program and it can change relaxation scheme for example. I guess I'll just copy amgc/runtime.hpp
and enhance it to support my custom types.
Ok, I see now what you want to do. What I said above was about compile-time configuration. If you want to be able to extend amgcl::runtime::amg
class with your own components, you need to do everything I said before, and also register the new components in amgcl/runtime.hpp
. There are several places where a runtime parameter is converted to a compile-time type with help of a switch
statement.
For example, for a new smoother, you need to add it here and here.
Yes, that's what I had in mind. Only thing is that I have to duplicate amgcl/runtime.hpp
into my source-tree to register the new components, but that's not a big deal.
These stackoverflow questions seem relevant:
http://stackoverflow.com/questions/4790721/c-type-registration-at-compile-time-trick http://stackoverflow.com/questions/10910584/assembling-a-compile-time-list-of-types-one-by-one-c
With something like this it should be possible to register user-defined components (classes) without rewriting runtime.hpp
every time.
This (based on the accepted answer in the first question above) seems to work with C++03 so I may try to replace the hard-coded components in runtime interface with something similar.
The template circuitry it involves is a bit convoluted, but it would most likely work. I'm wondering if there could be a simpler alternative though (like having a make_runtime_solver
facility that could take a third template argument describing the mapping between config options and user-defined classes). Well, whatever works, so I'll let you decide how much time is worth spending on this issue. For now I'm happy enough with copying the runtime.hpp
file into my codebase, as it is reasonably small.
Otherwise i was able to write my prolongation operator easily. There's still one little thing bugging me, but it has little to do with this topic: I have set the parameter coarse_enough
to 4096
for the AMG hierarchy (so that I end with 4 levels), and I observe when profiling that the operation "coarsest level"
is taking up most of the time (45%) during the AMG construction. It takes about 3s, but looking at the code it looks like its only copying one matrix (which has 71424 nnz in my case). Do you have any idea if this is normal?
The template circuitry it involves is a bit convoluted, but it would most likely work.
One problem it does not solve is that enums that are used to specify the runtime parameters (like this one) would still have to be specified explicitly. One option would be to get rid of the enums completely and just use string ids, but that means changing the existing API, so I am not sure that it worths it.
I have set the parameter coarse_enough to 4096 for the AMG hierarchy (so that I end with 4 levels), and I observe when profiling that the operation "coarsest level" is taking up most of the time (45%) during the AMG construction.
The coarsest level constructor also creates a direct solver for the matrix. For the builtin backend its a skyline_lu sparse direct solver. For a 3D poisson problem with 293143 nnz at the coarsest level it takes 1.8s on my machine, so your result does seem too long (did you enable compiler optimization?):
./runtime -n 120
Number of levels: 3
Operator complexity: 1.56
Grid complexity: 1.12
level unknowns nonzeros
---------------------------------
0 1728000 12009600 (63.99%)
1 206075 6464415 (34.45%)
2 4507 293143 ( 1.56%)
Iterations: 11
Error: 8.79808e-09
[Profile: 7.121 s] (100.00%)
[ assemble: 0.091 s] ( 1.27%)
[ setup: 3.194 s] ( 44.85%)
[ self: 0.022 s] ( 0.31%)
[ coarse operator: 0.795 s] ( 11.16%)
[ coarsest level: 1.793 s] ( 25.18%)
[ move to backend: 0.038 s] ( 0.54%)
[ transfer operators: 0.545 s] ( 7.66%)
[ self: 0.173 s] ( 2.43%)
[ aggregates: 0.152 s] ( 2.13%)
[ interpolation: 0.221 s] ( 3.10%)
[ self: 0.196 s] ( 2.75%)
[ tentative: 0.025 s] ( 0.35%)
[ solve: 3.834 s] ( 53.84%)
[ self: 0.909 s] ( 12.76%)
[ coarse: 0.126 s] ( 1.77%)
[ prolongate: 0.348 s] ( 4.89%)
[ relax: 1.525 s] ( 21.42%)
[ residual: 0.655 s] ( 9.20%)
[ restrict: 0.270 s] ( 3.80%)
For GPGPU backends the direct coarse solver is dense LU decomposition (from boost.uBlas), so 4000 dofs is probably too much for it.
You could try to decrease the coarse_enough
parameter if the coarse solver setup takes too much for you. It should be faster to setup, but could potentially take slightly more iterations, so you need to see whats best for your case.
Yes, reducing the coarse_enough
parameter leads to more mg levels and more iterations to converge. Btw I didn't see a max_levels
parameter to limit the number of mg levels? I think it would be useful to have one.
I'm using the GPGPU backend, and have enabled compiler optimization. That being said, when I factorize the same matrix with Cholmod from SuiteSparse (I use the CholmodSupport wrapper provided by Eigen), it takes 0.0049s to compute, which is drastically superior the current implementation.
I think it would be beneficial to be able to specify the coarse-space solver as well, be it a CPU or a GPU solver. I don't know the underlying API of cholmod so I don't know if it's possible to retrieve the factorized matrices provided by cholmod, and use them during the solve()
iterations to keep everything on GPU memory. But even if you allow some memory transfers during the solve it looks like you can gain a lot from it.
Timings of ./runtime -n 120
on my computer:
Number of levels: 3
Operator complexity: 1.56
Grid complexity: 1.12
level unknowns nonzeros
---------------------------------
0 1728000 12009600 (63.99%)
1 206075 6464415 (34.45%)
2 4507 293143 ( 1.56%)
Iterations: 11
Error: 8.79808e-09
[Profile: 4.930 s] (100.00%)
[ assemble: 0.051 s] ( 1.03%)
[ setup: 1.943 s] ( 39.41%)
[ self: 0.010 s] ( 0.21%)
[ coarse operator: 0.635 s] ( 12.87%)
[ coarsest level: 0.981 s] ( 19.90%)
[ move to backend: 0.022 s] ( 0.45%)
[ transfer operators: 0.295 s] ( 5.98%)
[ self: 0.101 s] ( 2.04%)
[ aggregates: 0.091 s] ( 1.84%)
[ interpolation: 0.103 s] ( 2.09%)
[ self: 0.092 s] ( 1.86%)
[ tentative: 0.011 s] ( 0.23%)
[ solve: 2.935 s] ( 59.53%)
[ self: 0.768 s] ( 15.58%)
[ coarse: 0.074 s] ( 1.51%)
[ prolongate: 0.247 s] ( 5.01%)
[ relax: 1.158 s] ( 23.48%)
[ residual: 0.462 s] ( 9.38%)
[ restrict: 0.226 s] ( 4.58%)
Btw I didn't see a max_levels parameter to limit the number of mg levels? I think it would be useful to have one.
That is a dangerous parameter exactly for this reason. This way amgcl may end up with too big a matrix to factorize.
I think it would be beneficial to be able to specify the coarse-space solver as well
I think it should be possible to just derive your own class from a backend and override the direct_solver typedef.
I don't know the underlying API of cholmod so I don't know if it's possible to retrieve the factorized matrices provided by cholmod, and use them during the solve() iterations to keep everything on GPU memory.
That is a nice idea. I don't want to create a mandatory dependency on cholmod, but i can do this with skyline_lu
solver.
Timings of ./runtime -n 120 on my computer:
My machine is old and grumpy :).
Can this be closed?
I think so! Right now I'm using my own copy of runtime.hpp
to handle my custom extensions. Pretty simple but it works well.
Hi,
I was wondering if it is currently possible to extend AMGCL's classes with user-defined schemes. For example if I want to add a custom relaxation/coarsening operator (to do geometric multigrid), or my own smoothing operator, while at the same time keeping the flexibility of the
runtime::make_solver
class (to experiment different strategies without recompiling the program).If it's already possible to do that, could you provide some documentation / example ?