User-defined extensions

jdumas commented 8 years ago

Hi,

I was wondering if it is currently possible to extend AMGCL's classes with user-defined schemes. For example if I want to add a custom relaxation/coarsening operator (to do geometric multigrid), or my own smoothing operator, while at the same time keeping the flexibility of the runtime::make_solver class (to experiment different strategies without recompiling the program).

If it's already possible to do that, could you provide some documentation / example ?

ddemidov commented 8 years ago

It should be possible, but the best documentation right now is looking at an existing implementation of the component you want to introduce.

To create a relaxation scheme, you can look at a damped_jacobi implementation. A relaxation should be a class/struct with the Backend template parameter. Its constructor should receive a matrix on the current level (in the builtin::crs<> format), its own parameters, and the backend parameters. The constructor should create any structures it needs to work inside the selected Backend. Then, it should provide two methods: apply_pre() and apply_post() for doing pre- and post-smoothing respectedly.

As for a coarsening, it should be a plain class/struct, that again defines its own parameters. Its constructor takes the current system matrix and returns a tuple of prolongation and restriction operators (all the matrices are in builtin::crs format). For an examples, take a look at coarsening::aggregation or coarsening::rege_stuben.

Please do not hesitate to ask any questions you have regarding the implementation, I'll be glad to help.

jdumas commented 8 years ago

Thanks for the answer! I'll take another look at the implementation then.

My only concern right now however, is whether I can integrate my new class to the runtime::make_solver paradigm easily? It would be straightforward to just copy amgcl/runtime.hpp into my program, and append my own class where needed. But this is not a satisfying long-term solution. Maybe you can think of a way to get around this hindrance?

ddemidov commented 8 years ago

There is very little magic inside the make_solver class. It just combines a preconditioner and an iterative solver together. Preconditioner is anything that has .apply(rhs, x) method (and a .system_matrix() that returns the matrix it was constructed for), and a solver is a functor that takes preconditioner, rhs and x, and returns a tuple of iterations made and an achieved residual.

Have a look at https://github.com/ddemidov/amgcl/blob/master/amgcl/relaxation/as_preconditioner.hpp that wraps a smoother to make a preconditioner.

make_solver just calls the solver while passing the preconditioner to it: https://github.com/ddemidov/amgcl/blob/master/amgcl/make_solver.hpp#L141-L152.

Does that answer you question?

jdumas commented 8 years ago

I think it does! What I want to do is be able to switch at runtime between existing schemes and my user-defined class. I find it very convenient that you can pass a json file to the program and it can change relaxation scheme for example. I guess I'll just copy amgc/runtime.hpp and enhance it to support my custom types.

ddemidov commented 8 years ago

Ok, I see now what you want to do. What I said above was about compile-time configuration. If you want to be able to extend amgcl::runtime::amg class with your own components, you need to do everything I said before, and also register the new components in amgcl/runtime.hpp. There are several places where a runtime parameter is converted to a compile-time type with help of a switch statement.

For example, for a new smoother, you need to add it here and here.

jdumas commented 8 years ago

Yes, that's what I had in mind. Only thing is that I have to duplicate amgcl/runtime.hpp into my source-tree to register the new components, but that's not a big deal.

ddemidov commented 8 years ago

These stackoverflow questions seem relevant:

http://stackoverflow.com/questions/4790721/c-type-registration-at-compile-time-trick http://stackoverflow.com/questions/10910584/assembling-a-compile-time-list-of-types-one-by-one-c

With something like this it should be possible to register user-defined components (classes) without rewriting runtime.hpp every time.

ddemidov commented 8 years ago

This (based on the accepted answer in the first question above) seems to work with C++03 so I may try to replace the hard-coded components in runtime interface with something similar.

jdumas commented 8 years ago

The template circuitry it involves is a bit convoluted, but it would most likely work. I'm wondering if there could be a simpler alternative though (like having a make_runtime_solver facility that could take a third template argument describing the mapping between config options and user-defined classes). Well, whatever works, so I'll let you decide how much time is worth spending on this issue. For now I'm happy enough with copying the runtime.hpp file into my codebase, as it is reasonably small.

Otherwise i was able to write my prolongation operator easily. There's still one little thing bugging me, but it has little to do with this topic: I have set the parameter coarse_enough to 4096 for the AMG hierarchy (so that I end with 4 levels), and I observe when profiling that the operation "coarsest level" is taking up most of the time (45%) during the AMG construction. It takes about 3s, but looking at the code it looks like its only copying one matrix (which has 71424 nnz in my case). Do you have any idea if this is normal?

ddemidov commented 8 years ago

The template circuitry it involves is a bit convoluted, but it would most likely work.

One problem it does not solve is that enums that are used to specify the runtime parameters (like this one) would still have to be specified explicitly. One option would be to get rid of the enums completely and just use string ids, but that means changing the existing API, so I am not sure that it worths it.

I have set the parameter coarse_enough to 4096 for the AMG hierarchy (so that I end with 4 levels), and I observe when profiling that the operation "coarsest level" is taking up most of the time (45%) during the AMG construction.

The coarsest level constructor also creates a direct solver for the matrix. For the builtin backend its a skyline_lu sparse direct solver. For a 3D poisson problem with 293143 nnz at the coarsest level it takes 1.8s on my machine, so your result does seem too long (did you enable compiler optimization?):

./runtime -n 120
Number of levels:    3
Operator complexity: 1.56
Grid complexity:     1.12

level     unknowns       nonzeros
---------------------------------
    0      1728000       12009600 (63.99%)
    1       206075        6464415 (34.45%)
    2         4507         293143 ( 1.56%)

Iterations: 11
Error:      8.79808e-09

[Profile:                    7.121 s] (100.00%)
[  assemble:                 0.091 s] (  1.27%)
[  setup:                    3.194 s] ( 44.85%)
[   self:                    0.022 s] (  0.31%)
[    coarse operator:        0.795 s] ( 11.16%)
[    coarsest level:         1.793 s] ( 25.18%)
[    move to backend:        0.038 s] (  0.54%)
[    transfer operators:     0.545 s] (  7.66%)
[     self:                  0.173 s] (  2.43%)
[      aggregates:           0.152 s] (  2.13%)
[      interpolation:        0.221 s] (  3.10%)
[       self:                0.196 s] (  2.75%)
[        tentative:          0.025 s] (  0.35%)
[  solve:                    3.834 s] ( 53.84%)
[   self:                    0.909 s] ( 12.76%)
[    coarse:                 0.126 s] (  1.77%)
[    prolongate:             0.348 s] (  4.89%)
[    relax:                  1.525 s] ( 21.42%)
[    residual:               0.655 s] (  9.20%)
[    restrict:               0.270 s] (  3.80%)

For GPGPU backends the direct coarse solver is dense LU decomposition (from boost.uBlas), so 4000 dofs is probably too much for it.

You could try to decrease the coarse_enough parameter if the coarse solver setup takes too much for you. It should be faster to setup, but could potentially take slightly more iterations, so you need to see whats best for your case.

jdumas commented 8 years ago

Yes, reducing the coarse_enough parameter leads to more mg levels and more iterations to converge. Btw I didn't see a max_levels parameter to limit the number of mg levels? I think it would be useful to have one.

I'm using the GPGPU backend, and have enabled compiler optimization. That being said, when I factorize the same matrix with Cholmod from SuiteSparse (I use the CholmodSupport wrapper provided by Eigen), it takes 0.0049s to compute, which is drastically superior the current implementation.

I think it would be beneficial to be able to specify the coarse-space solver as well, be it a CPU or a GPU solver. I don't know the underlying API of cholmod so I don't know if it's possible to retrieve the factorized matrices provided by cholmod, and use them during the solve() iterations to keep everything on GPU memory. But even if you allow some memory transfers during the solve it looks like you can gain a lot from it.

Timings of ./runtime -n 120 on my computer:

Number of levels:    3
Operator complexity: 1.56
Grid complexity:     1.12

level     unknowns       nonzeros
---------------------------------
    0      1728000       12009600 (63.99%)
    1       206075        6464415 (34.45%)
    2         4507         293143 ( 1.56%)

Iterations: 11
Error:      8.79808e-09

[Profile:                    4.930 s] (100.00%)
[  assemble:                 0.051 s] (  1.03%)
[  setup:                    1.943 s] ( 39.41%)
[   self:                    0.010 s] (  0.21%)
[    coarse operator:        0.635 s] ( 12.87%)
[    coarsest level:         0.981 s] ( 19.90%)
[    move to backend:        0.022 s] (  0.45%)
[    transfer operators:     0.295 s] (  5.98%)
[     self:                  0.101 s] (  2.04%)
[      aggregates:           0.091 s] (  1.84%)
[      interpolation:        0.103 s] (  2.09%)
[       self:                0.092 s] (  1.86%)
[        tentative:          0.011 s] (  0.23%)
[  solve:                    2.935 s] ( 59.53%)
[   self:                    0.768 s] ( 15.58%)
[    coarse:                 0.074 s] (  1.51%)
[    prolongate:             0.247 s] (  5.01%)
[    relax:                  1.158 s] ( 23.48%)
[    residual:               0.462 s] (  9.38%)
[    restrict:               0.226 s] (  4.58%)

ddemidov commented 8 years ago

Btw I didn't see a max_levels parameter to limit the number of mg levels? I think it would be useful to have one.

That is a dangerous parameter exactly for this reason. This way amgcl may end up with too big a matrix to factorize.

I think it would be beneficial to be able to specify the coarse-space solver as well

I think it should be possible to just derive your own class from a backend and override the direct_solver typedef.

I don't know the underlying API of cholmod so I don't know if it's possible to retrieve the factorized matrices provided by cholmod, and use them during the solve() iterations to keep everything on GPU memory.

That is a nice idea. I don't want to create a mandatory dependency on cholmod, but i can do this with skyline_lu solver.

Timings of ./runtime -n 120 on my computer:

My machine is old and grumpy :).

ddemidov commented 8 years ago

Can this be closed?

jdumas commented 8 years ago

I think so! Right now I'm using my own copy of runtime.hpp to handle my custom extensions. Pretty simple but it works well.

ddemidov / amgcl

User-defined extensions #23