String Method with many restraints really slow?

sci-coder commented 2 years ago

I have been trying to use string method by using 2296 distance restraints on about 334 atoms pairs of atoms in my system. Initially the simulations blew up as soon as string update and reparametrizations happened. It turned out that the rest_function in RPATH section can only take a limited number of characters. So I updated the MaxLine variable in src/lib/string.fpp from 1500 to 50000, which resolved the blow-up of energies.

However, when I run string method simulation using [RPATH] section, the simulation is really slow. For comparison, if I run simple restrained MD with same number a of restraints while omitting the [RPATH] section entirely, I can get performance of 7,8 ns per day with 32 cores (single node) on a 60k atom (including water) system. However, when I run the same system using [RPATH] section (single replica, rpath_period = 0), the performance drops to 1ns per day with the same resources. This implies that the problem only happens when the [RPATH] section is included. When I plot the simulation time with the number of restraints specified in rest_function, I can see a behavior, simulation time proportional to square of a number of restraints. it seems that the main bottleneck is comm_force in integrator which also scales as the square of a number of restraints. The following results are in seconds for a 1000 step simulation length

Rpath Restraints	Dynamics	Energy	Integrator	pairlist	integrator_comm_force
1	20.644	14.3	3.716	1.69	0.149
30	20.653	14.08	3.915	1.68	0.1542
120	20.97	14.138	4.219	1.68	0.26
570	28.951	14.53	11.931	1.68	2.935
1080	51.86	15.44	34.063	1.69	10.65
1560	83.5	16.535	64.677	1.69	21.608
1950	141	18.859	119.45	1.69	40.824

The plot below shows Dynamics Time (Y-axis) plotted against Number of Rpath Restraints (X-axis)

jaewoonjung commented 2 years ago

This is Jaewoon, one of the genesis developers.

Thank you for letting us know the performance issue. The performance issue reported by you is very unexpected one. If we include restraints, the restraint function is assigned to one process and communicated to other processes. Therefore, it requires global communication and increased amount of restraints slows down the performance. However, the amount of slow down is very large.

I guess you're using a single node with 1 replica, right? Please check the performance by changing MPI and OpenMP numbers. For example, you can run with "MPI=32, OpenMP=1", "MPI=16, OpenMP=2", and "MPI=8, OpenMP=4". Also, please check if all the processes are idle before you start running. After that, we might understand what is the main source of performance reduction.

chig commented 2 years ago

This is Chigusa who is a genesis developer, too. I guess that the slow performance is come from 'compute_stats' in sp_energy.fpp while time stamp shows that it occurs on 'communiate_force'. ('communicate_force' executes immediately after 'compute_stats') String method needs to calculate the metric and the metric is updated at every steps and the matrix size of metric is square of dimensions (= # of restraints). Then, the computation time would be enlarged by # of dimension.

sci-coder commented 2 years ago

The above testing was done with 1 replica/system split into 32 MPI Processes, 1 OpenMP thread for each MPI process on Kyoto University Cluster using the beta version of genesis 2.0 beta version Below, I summarize the testing of OpenMP-MPI combinations below for 1950 functions in the rest_function of the [RPATH] section. It is curious that when 32OpenMP and 1 MPI process are used, the comm_force suddenly drops to zero. However, the total time of dynamics keeps on increasing with an increasing number of OpenMP threads

MPI-OpenMP	Dynamics	Energy	Integrator	pairlist	Integrator: comm_force
32/1	127.459	17.7	106.6	1.69	36.050
16/2	127.97	21.186	104.040	1.725	71.857
8/4	131.79	29.04	99.64	1.89	91.477
4/8	141.2	46.72	90.939	2.322	78.86
2/16	152.5	75.2	73.3	2.254	52.27
1/32	175.9	132.849	37.202	2.218	0.042

So this is a theoretical bottleneck about which probably we cannot do anything other than reducing the number of restraints?

genesis-release-r-ccs / genesis-2.0

String Method with many restraints really slow? #2