CompPhysVienna / n2p2

n2p2 - A Neural Network Potential Package
https://compphysvienna.github.io/n2p2/
GNU General Public License v3.0
217 stars 82 forks source link

Memory used by N2P2 #149

Open xywang58 opened 2 years ago

xywang58 commented 2 years ago

Dear n2p2 developers,

I have a general question about the memory usage by n2p2 when running the nnp-train in the parallel mode. I tried to run "mpirun -np 64 nnp-train" on the cluster. But it quitted short after initiation without giving explicit error. So I guess the problem might be that I blew up the memory. Then, I tried to run "mpirun -np 12 nnp-train" with fewer cores requested and it worked well.

So my question is how to estimate the memory usage of "nnp-train" in parallel mode? For example, the last few lines on the output of "nnp-scaling" are (I used 24 cores for nnp-scaling):

* MEMORY USAGE ESTIMATION ***** Estimated memory usage for training (keyword "memorize_symfunc_results": Valid for training of energies and forces. Memory for local structures : 22610071572 bytes (21562.64 MiB = 21.06 GiB). Memory for all structures : 542599609152 bytes (517463.31 MiB = 505.34 GiB). Average memory per structure : 20096282 bytes (19.17 MiB).


If I run it by 24 cores, how much memory will it take as a total?

Thanks in advance for any suggestions.

philippmisof commented 2 years ago

Is the keyword memorize_symfunc_results used in the input.nn file? The following part is only true if this mode is chosen: The memory usage estimate only accounts for storing the symmetry function values of each structure in your data set. While "Memory for all structures" calculates the total amount of memory your calculation needs, "Memory for local structures" only accounts for the needed memory for one MPI task (since each MPI task only stores the symmetry function values belonging to its assigned structures). So for your data set you would need a total of 505 GiB of memory and around 21 GiB per MPI task (note that we are talking about RAM, not disk storage), which is a lot. How much RAM is available on your computing node(s)? If your setup does not have enough memory for this calculation I'm surprised that it works with fewer cores. What happens if you remove the keyword memorize_symfunc_results in the input.nn file? Note that this decreases the performance of the training since the symmetry function values need to be recalculated over and over again but your memory footprint reduces drastically.

In case this is not the issue, what updater_type did you use?

xywang58 commented 2 years ago

I did turn off the memorize_symfunc_results in the input.nn file. Otherwise, the program blew up right away because the computing node only has 256 GB memory in total. But still, using more cores caused troubled, as I was warned by IT administrator that the program lead to either a crash or a reboot on the computing node. My updater_type is 1, which is the Kalman filter.

philippmisof commented 2 years ago

How large is your data set? how many structures does it contain and what is the maximum number of atoms that are (roughly) present in one structure? Can you attach the the output files (especially the training log)?

I'd like to mention that although the Kalman filter is the preferred method in n2p2, I usually don't use it with more than ~24 cores since the performance doesn't increase significantly anymore when going to even higher numbers. This is a result of a matrix inversion during the synchronization step. The size of the matrix scales with the number of MPI tasks...

Nevertheless the problem you are dealing with shouldn't happen so I'm still interested what is going on.

xywang58 commented 2 years ago

Thanks for your suggestion. Your suggestion also leads me to a question: will n2p2 support CUDA in the future?

My data set contain 27,000 snapshots with 222 atoms in each snapshots. The major problem is I am trying to train crystal structures, where atoms are closely packed into a relative small region. I have 6 elements, which worsens the case. So even a short cutoff leads to a huge amount of neighboring atoms. Such solid state condensed system also leads to very low training rate. I need to spend 2 days for just one epoch. I am trying the wACSF methods implemented in this paper. https://aip.scitation.org/doi/full/10.1063/1.5019667. I have found that the memory usage has been reduced significantly. I hope it would also lead to a better training.

Please see my attachment for the nnp-scaling log file mode1.txt .

philippmisof commented 2 years ago

As far as I know a CUDA implementation is not currently planned, though anyone is invited to contribute such a feature. CUDA could make sense in the calculation of the symmetry function (in fact they aren't parallelized at all at the moment). However, during training there is also a bottleneck in the update step which currently needs MPI synchronization because it is non-trivial to get rid off.

Tweaking your symmetry function setup is probably your best bet to improve the performance. But as far as I can see from the log you've attached you are only using type 2 and 3 (SymFncExpRad, SymFncExpAngn), or am I mistaken? Furthermore I have the feeling that you are using more symmetry functions than necessary, maybe have a look at this: https://compphysvienna.github.io/n2p2/tools/nnp-prune.html Also, 27,000 snapshots seem a lot to me. Was it generated "carefully"? I haven't generated one myself yet since I know that generating a good data set takes a lot of time. The goal would be to start with a rather small data set and extend it until one hardly ever experiences any extrapolation warnings during a MD simulation (and of course large enough to prevent overfitting).