deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

Need subspace diagonalization with parallel #5480

Open pxlxingliang opened 1 week ago

pxlxingliang commented 1 week ago

Background

Now, the subspace diagonalization of dav is by lapack with one core, while for large system, the dimension of this subspace may be hundreds, and can be effectively accelerated by parallel.

QE has the same function and can be used by setting value of nd in command: https://www.quantum-espresso.org/Doc/user_guide/node20.html

Describe the solution you'd like

I will implement a function to divide the H and S matrices into 2D blocks, and then call elpa or scalapack to do parallel diagonalization.

Task list only for developers

Notice Possible Changes of Behavior (Reminder only for developers)

No response

Notice any changes of core modules (Reminder only for developers)

No response

Notice Possible Changes of Core Modules (Reminder only for developers)

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

pxlxingliang commented 1 day ago

I have conducted a comparative analysis of the computational efficiency of solving generalized eigenvalue problems using ELPA, ScaLAPACK, and LAPACK for matrices of varying dimensions. The results indicate that for small matrices, specifically those with a bandwidth of less than 100, LAPACK demonstrates superior efficiency. However, as the matrix size increases, the efficiency of ELPA and ScaLAPACK becomes more pronounced.

Furthermore, the block size is a significant factor affecting efficiency. For ELPA, optimal performance is achieved with block sizes of either 16 or 32.

The speed up of ELPA/ScaLAPACK relative to LAPACK on matrices of varying dimensions and different cores and block size. Each row represents the number of parallel cores, and each column corresponds to the block size. The two values presented for each configuration represent the speedup of ELPA/ScaLAPACK relative to LAPACK. For each case, 10 random H/S matrix are generated and solve 10 times.
The test codes: https://github.com/deepmodeling/abacus-develop/pull/5549/files#diff-4cfdb3bd4f00aee2894decd88dc0691e059bb6810b1888b32a8bd3c6e48b78f2R326

#ndim=64,nband=50​
           1          4          16         20         32         50         64​
4   0.56/0.17  0.91/0.39  1.00/0.58  0.95/0.60  1.00/0.71         --          --​
8   0.53/0.15  0.82/0.35  0.94/0.55         --         --         --          --​
16  0.54/0.15  0.77/0.33  0.85/0.51         --         --         --          --​
​
#ndim=100,nband=50​
           1          4          16         20         32         50         64​
4   0.52/0.14  0.89/0.37  1.00/0.55  0.97/0.57  0.92/0.60  0.91/0.71         --​
8   0.52/0.13  0.83/0.34  0.94/0.54  0.93/0.58          --        --        --​
16  0.53/0.12  0.82/0.33  0.89/0.48  0.87/0.53          --        --        --​
​
#ndim=100,nband=80​
           1          4          16         20         32         50         64​
4   0.71/0.19  1.21/0.49  1.36/0.72  1.35/0.78  1.27/0.80  1.34/0.94      --​
8   0.73/0.18  1.17/0.47  1.29/0.74  1.31/0.79  1.32/0.81          --        --​
16  0.74/0.17  1.12/0.45  1.18/0.67  1.22/0.74  1.23/0.77          --        --​
  ​
#ndim=200,nband=50​
           1          4          16         20         32         50         64​
4   0.43/0.09  0.92/0.35  1.02/0.58  1.08/0.60  1.00/0.64  0.95/0.68  0.81/0.68​
8   0.54/0.10  1.01/0.34  1.08/0.56  1.10/0.61  1.12/0.65  0.95/0.69  --​
16  0.62/0.10  1.03/0.33  1.15/0.55  1.13/0.55  1.08/0.57  0.99/0.68  --​
​
#ndim=200,nband=100​
           1          4          16         20         32         50         64​
4   0.66/0.13  1.31/0.46  1.46/0.75  1.48/0.77  1.42/0.80  1.41/0.86  1.14/0.85​
8   0.82/0.11  1.41/0.36  1.62/0.60  1.54/0.65  1.54/0.66  1.34/0.69  --​
16  0.89/0.14  1.50/0.47  1.67/0.73  1.48/0.73  1.52/0.79  1.34/0.88  --​
​
#ndim=200,nband=160​
           1          4          16         20         32         50         64​
4   0.92/0.13  1.77/0.43  1.99/0.68  2.08/0.71  1.93/0.73  1.85/0.80  1.54/0.76​
8   1.17/0.23  2.06/0.71  2.34/1.16  2.26/1.23  2.17/1.26  2.01/1.34  --​
16  1.27/0.22  2.08/0.70  2.27/1.06  2.20/1.12  2.02/1.13  1.94/1.28  --​
​
#ndim=300,nband=240​
           1          4          16         20         32         50         64​
4   1.16/0.25  2.22/0.88  2.47/1.39  2.37/1.43  2.46/1.47  2.24/1.51  2.12/1.55​
8   1.60/0.29  2.82/0.97  3.20/1.57  3.15/1.65  3.17/1.73  2.64/1.76  2.43/1.78​
16  1.84/0.16  3.14/0.56  3.46/0.87  3.31/0.89  3.28/0.94  2.60/0.97  2.55/1.05​
​
#ndim=400,nband=320​
           1          4          16         20         32         50         64​
4   1.39/0.29  2.63/1.08  2.94/1.73  2.88/1.76  2.93/1.81  2.71/1.81  2.40/1.78​
8   2.12/0.25  3.55/0.82  4.00/1.33  3.97/1.38  3.96/1.46  3.60/1.46  3.01/1.42​
16  2.52/0.18  4.51/0.68  4.97/1.11  4.84/1.14  4.67/1.16  4.12/1.22  3.28/1.21​
​
#ndim=500, nband=400​
          16         20         32         50         64         128​
4   3.36/1.96  3.28/2.02  3.39/2.07  3.14/2.06  2.96/2.03  2.26/1.95​
8   4.89/1.32  4.71/1.34  4.80/1.36  4.34/1.38  4.15/1.39  2.72/1.22​
16  6.18/1.39  5.87/1.40  6.07/1.45  5.09/1.48  4.66/1.55  2.89/1.54

Below are the times by ELPA/SCALAPACK on large matrix with different cores and block size. Unit is ms.

#ndim=600, nband=500​
              16             32             64             128​
4   357.33/595.67  348.67/570.67  397.00/579.33  482.00/580.33​
8   251.67/854.00  239.67/805.33  274.00/804.33  394.33/868.33​
16  193.00/818.33  181.67/775.33  233.67/738.67  367.33/641.67​
​
#ndim=800, nband=600​
               16              32              64              128​
4   667.00/1188.67  651.33/1125.67  731.00/1148.00  962.33/1216.00​
8   447.33/1556.67  436.33/1481.33  516.00/1516.00  714.33/1639.67​
16  337.67/1461.33  325.33/1394.00  394.33/1374.00  642.00/1404.00​
​
#ndim=1000, nband=800​
                16               32               64               128​
4   1150.00/2295.33  1163.67/2240.00  1286.67/2278.33  1607.33/2336.33​
8    770.33/2767.00   763.67/2686.00   857.67/2779.67  1098.00/2957.33​
16   544.67/2559.00   542.33/2474.00   612.00/2428.00   928.33/2387.33​
​
# ndim=1200, nband=1000​
                16               32               64               128​
4   1878.33/3905.33  1853.00/3731.33  2052.33/3772.33  2542.33/3962.00​
8   1203.00/4494.00  1171.33/4352.33  1296.00/4452.33  1625.67/4744.67​
16   831.67/4086.67   818.67/3938.67   923.67/3925.00   923.67/3925.00