Open cesaremalosso opened 3 months ago
This is a very good question! Actually I have done some benchmarks and discussed with @amcadmus back to August last year. We do think that using the GPU version of pppm/dplr would be beneficial. Similarly, I agree that using a thread parallelization might be useful. I am sorry that I only had very limited time to work on this problem durint the past year. I will try my best to figure it out. Do you have any suggestions or have you done any tests about the solutions to accelerate the pppm?
Actually I'm not very practical in this kind of coding so I would not be very helpful...I can do some testing if it can be useful!
Summary
kspace_style pppm/dplr
is quite slow in LAMMPS, significantly slowing down the MD simulation. A multiprocessing code running on CPU (or a GPU implementation) could speed-up significantly the simulation.Detailed Description
Hi, I'm running a
dplr
MD simulations with LAMMPS and I am facing low performances in the long-range part of the calculation. I'm running on 4 GPUs in a single node using 1 MPI process for each gpu. This is the performance report I get at the end of my simulation of 2727 atoms (and 909 wannier centroids):It seems that the
kspace_style pppm/dplr
, which is used to account for the long-range interactions, is quite slow in LAMMPS, significantly slowing down the MD simulation. Using more GPUS does not increase significantly the performance since it improve only thePair
time.Do you think it would be beneficial to implement OpenMP thread parallelization to speed this part up? Perhaps using GPUs for both the short-range NNP and the Wannier NN, while using multiple processes on multiple CPUs for the particle-particle particle-mesh solver? Could a GPU
pppm/dplr
code also increase the performance?Further Information, Files, and Links
No response