ddemidov / amgcl

C++ library for solving large sparse linear systems with algebraic multigrid method
http://amgcl.readthedocs.org/
MIT License
723 stars 110 forks source link

Question: Modifiying the AMG with changing boundary conditions #59

Closed ghost closed 6 years ago

ghost commented 6 years ago

Hi Denis,

My research involves modelling biological processes, for example blood vessels growing into a tissue. I use the AMGCL library for solving the diffusion of molecules within the tissue domain. The blood vessels act as a dirichlet BC for the Poisson, acting as a source and a sink for different molecules, so I have a process which roughly like:

A = stiffness_matrix(n,n,n) # stiffness matrix for 3d poisson
for t < total_time:
    BC = vessel_locations
    A_new = applyBC(A,BC)
    precond = pyamgl.amg(A_new)
    solver=pyamgcl.solver(precond)
    C = agmcl.solver(C)
    vessel_locations = move_vessesl(C,vessel_locations)

So at each iteration the BC are changing as the vasculature network grows. I was wondering if you thought it would be feasible for me to "update" the preconditioner each iteration rather than recreating it. I currently do this for the matrix but this doesn't save as much time as if I could also modify the MG.

Cheers, Duncan

ddemidov commented 6 years ago

Hi Duncan,

It is currently impossible in amgcl to do such an update operation. Whenever the system matrix changes, you have to rebuild the preconditioner. However,

  1. Since the boundary conditions are Dirichlet, would it be possible to move them to the RHS, thus keeping the system matrix invariant?
  2. You can try to reuse the preconditioner for some time steps (for example, as long as the number of iterations stays withing some threshold). AMGCL solvers have a version of function call operation that takes both the matrix and the preconditioner. This is also exposed in python interface.
ghost commented 6 years ago

Hi Denis,

Sorry for the extremely late reply. I ended up porting the majority of my code to c++ and then implemented your second suggestion. I took advantage of the non_copy constructor and changed the matrix externally. I took a while to find the threshold for number of iterations but, now it's working well and it's a nice solution.

Thanks for your suggestion, Duncan

ddemidov commented 6 years ago

Thanks for following up on this! Out of interest, how much faster is your current approach than just rebuilding amg preconditioner on each step?

ghost commented 6 years ago

So I found that it was very rare for the pre-conditioner to ever be reconstructed, in my test case the construction of the AMG was about 5 seconds and solution was about 0.5 seconds. I'll add a counter into my code to see how often it occurs.

I have a feeling the solving performance was only achieved because I also copied the previous solution and used it as an initial guess of the solution. I found reusing the initial guess would speed up the solution by a factor of 2-3X. I am also using a low relative tolerance 1e-4.

ddemidov commented 6 years ago

in my test case the construction of the AMG was about 5 seconds and solution was about 0.5 seconds

The fact that AMG construction takes 10 times more than solution seems strange to me. Are you using a GPU backend? Or does the solver makes just a few iterations until the required convergence? Another thing to check is if you are enabling compiler optimizations (compile in 'Release' mode). This usually has dramatic effect on performance of C++ code.

ghost commented 6 years ago

The solver takes 3-5 iterations typically, as more BC are added it increases but it was rare that it would ever surpass 10.

I am using the release mode of the MSVC compiler. The target platform is piz Daint (I see from your benchmarks that you are familiar with the system), these tests were done on a laptop so there could be other things which effected. I can update this in a week or so with data from running Daint.

I am using a pure CPU version. I would have thought that this approach would not work on a GPU as you copy everything over?

ddemidov commented 6 years ago

Same approach should work on a GPU, as reported in [1]. In fact, it should be more beneficial on a GPU, since the preconditioner setup is relatively more expensive (setup is performed on a CPU, while solution is done on a GPU).

[1] Demidov, D.E., and Shevchenko, D.V. "Modification of algebraic multigrid for effective GPGPU-based solution of nonstationary hydrodynamics problems." Journal of Computational Science 3.6 (2012): 460-462., https://doi.org/10.1016/j.jocs.2012.08.008

ghost commented 6 years ago

Thanks for the paper, I will be certain to cite it in my work.

I added some timers to my code and the AMG generation is 8.5 seconds while solution time varies from 0.05-0.3 seconds. I based the implementation on your shared memory benchmark, the differences being the tolerance, the use of SPAI-1 for relaxation and the matrix is not of uniform diffusivity. The problem size is 75**3. Not sure if this makes any more sense to you.

We have a some GPU hours on Daint, but mainly CPU hours. I could try and benchmark it with the GPU, do you have an example CMAKE file? I guess CUDA is well supported there, but I have heard nightmares about using openCL at CSCS.

ddemidov commented 6 years ago

Ok, I believe the the use of spai-1 explains the slow setup. In my experience, the smoother takes a long time to construct, and is not very effective, so I would try to switch to spai0, damped_jacobi, or ilu0 (in that order).

We have a some GPU hours on Daint, but mainly CPU hours. I could try and benchmark it with the GPU, do you have an example CMAKE file? I guess CUDA is well supported there, but I have heard nightmares about using openCL at CSCS.

If I recall correctly, I also failed to compile anything OpenCL-based on PizDaint, but CUDA backend, or VexCL backend with CUDA as its backend should work well. I don't currently have access to PizDaint, but this was the code (complete with CMakeLists.txt) we used to run the benchmarks there: https://github.com/ddemidov/amgcl_benchmarks

ghost commented 6 years ago

Ok changing from SPAI-1 to SPAI-0 puts the AMG generation at 0.5 seconds. I cannot really tell if the solution time is different as the variance between iterations is quite high. At the moment I would say it is identical.