NEP training time question

MES-physics commented 1 year ago

Hi Professors, Can anyone give me an estimate of how nep training time scales with number of atoms and generations? And an example with absolute numbers? I appreciate your help.

brucefan1983 commented 1 year ago

In https://github.com/brucefan1983/GPUMD/tree/master/examples/nep_potentials/PbTe/train, there is an example on PbTe. It took 12 min using my Laptop with an RTX 2070 GPU to train 10,000 generations. The batchsize is 25 and each structure in the training data set has 250 atoms. Therefore, the number of atoms in one batch is N = 25 * 250 = 6250.

The training time will be linearly proprotional to the number of generations. So training for 100,000 generations will take 2 hours if all the other inputs are the same.
The training time will also be linearly proportional to the number of atoms N in one batch, when N exceeds some critical value that is roughly the number of CUDA cores times some constant (about 3) . Therefore, if I increase the batchsize from 25 to 50, the time for training 10,000 generations will be increased from 12 min to about 24 min. However, if I decrease the batchsize from 25 to 5, the time for training 10,000 generations will not be reduced to 12/5 min, but will be only slightly shorter than 12 min. This is because most of the GPU resources will be wasted when the batchsize is too small.

MES-physics commented 1 year ago

Thanks so much!

brucefan1983 / GPUMD

NEP training time question #303