genesis-release-r-ccs / genesis-2.0

This is an old archived repository that we keep for our records. Please use recent GENESIS repository and do not use this one.
GNU Lesser General Public License v3.0
12 stars 1 forks source link

String Method with many restraints really slow? #2

Open sci-coder opened 2 years ago

sci-coder commented 2 years ago

I have been trying to use string method by using 2296 distance restraints on about 334 atoms pairs of atoms in my system. Initially the simulations blew up as soon as string update and reparametrizations happened. It turned out that the rest_function in RPATH section can only take a limited number of characters. So I updated the MaxLine variable in src/lib/string.fpp from 1500 to 50000, which resolved the blow-up of energies.

However, when I run string method simulation using [RPATH] section, the simulation is really slow. For comparison, if I run simple restrained MD with same number a of restraints while omitting the [RPATH] section entirely, I can get performance of 7,8 ns per day with 32 cores (single node) on a 60k atom (including water) system. However, when I run the same system using [RPATH] section (single replica, rpath_period = 0), the performance drops to 1ns per day with the same resources. This implies that the problem only happens when the [RPATH] section is included. When I plot the simulation time with the number of restraints specified in rest_function, I can see a behavior, simulation time proportional to square of a number of restraints. it seems that the main bottleneck is comm_force in integrator which also scales as the square of a number of restraints. The following results are in seconds for a 1000 step simulation length

Rpath Restraints Dynamics Energy Integrator pairlist integrator_comm_force
1 20.644 14.3 3.716 1.69 0.149
30 20.653 14.08 3.915 1.68 0.1542
120 20.97 14.138 4.219 1.68 0.26
570 28.951 14.53 11.931 1.68 2.935
1080 51.86 15.44 34.063 1.69 10.65
1560 83.5 16.535 64.677 1.69 21.608
1950 141 18.859 119.45 1.69 40.824

The plot below shows Dynamics Time (Y-axis) plotted against Number of Rpath Restraints (X-axis)

image

jaewoonjung commented 2 years ago

This is Jaewoon, one of the genesis developers.

Thank you for letting us know the performance issue. The performance issue reported by you is very unexpected one. If we include restraints, the restraint function is assigned to one process and communicated to other processes. Therefore, it requires global communication and increased amount of restraints slows down the performance. However, the amount of slow down is very large.

I guess you're using a single node with 1 replica, right? Please check the performance by changing MPI and OpenMP numbers. For example, you can run with "MPI=32, OpenMP=1", "MPI=16, OpenMP=2", and "MPI=8, OpenMP=4". Also, please check if all the processes are idle before you start running. After that, we might understand what is the main source of performance reduction.

chig commented 2 years ago

This is Chigusa who is a genesis developer, too. I guess that the slow performance is come from 'compute_stats' in sp_energy.fpp while time stamp shows that it occurs on 'communiate_force'. ('communicate_force' executes immediately after 'compute_stats') String method needs to calculate the metric and the metric is updated at every steps and the matrix size of metric is square of dimensions (= # of restraints). Then, the computation time would be enlarged by # of dimension.

sci-coder commented 2 years ago

The above testing was done with 1 replica/system split into 32 MPI Processes, 1 OpenMP thread for each MPI process on Kyoto University Cluster using the beta version of genesis 2.0 beta version Below, I summarize the testing of OpenMP-MPI combinations below for 1950 functions in the rest_function of the [RPATH] section. It is curious that when 32OpenMP and 1 MPI process are used, the comm_force suddenly drops to zero. However, the total time of dynamics keeps on increasing with an increasing number of OpenMP threads

MPI-OpenMP Dynamics Energy Integrator pairlist Integrator: comm_force
32/1 127.459 17.7 106.6 1.69 36.050
16/2 127.97 21.186 104.040 1.725 71.857
8/4 131.79 29.04 99.64 1.89 91.477
4/8 141.2 46.72 90.939 2.322 78.86
2/16 152.5 75.2 73.3 2.254 52.27
1/32 175.9 132.849 37.202 2.218 0.042

So this is a theoretical bottleneck about which probably we cannot do anything other than reducing the number of restraints?