OrderN / CONQUEST-release

Full public release of large scale and linear scaling DFT code CONQUEST
http://www.order-n.org/
MIT License
96 stars 25 forks source link

How to determine the matrix size of padded Hamiltonian (Padding Hamiltonian Matrix) #223

Open tsuyoshi38 opened 10 months ago

tsuyoshi38 commented 10 months ago

When the code is available to treat the padded Hamiltonian matrix, we should set the good value for block_size_r (and block_size_c).

First,

  1. Manually Given or Default Setting:
  1. Constraints in Scalapack
  2. Constraints in ELPA
    1. block_size_r = block_size_c
    2. proc_cols = M (integer) x proc_rows
    3. ?? needs more than 1 blocks in the corresponding column or row. ??
tsuyoshi38 commented 9 months ago

In the last comment for the constraints related to ELPA ...

For the constraint i) (block_size_r = block_size_c), judging from the following example shown in the page of ELPA, call elpa%set("nblk", nblk, success) ! size of the BLACS block cyclic distribution I guess we have only one parameter for the size of block.

On the other hand, the constraints ii) and iii) for ELPA are only from our benchmark tests. What I heard is ELPA was very inefficient if these two are not satisfied. Since I could not find any documents or other examples showing them on the internet, it is probably better to ignore these constraints for a while.

Then, we can simply set a default value of block size without considering the number of MPI processes and matrix size of Hamiltonian.

tsuyoshi38 commented 9 months ago

I think I have finished introducing "Padding H and S matrices" to make the dimension of matrix a multiple of Block size. Note that if the block size is small (1-5?), Scalapack is very inefficient.

I think the code is already useful for may users, but the test calculations I have doe so far may not be enough. Later, I will explain more about the relationship between # of MPI processes, dimension of matrix (H and S), and block size of matrices. We basically want to set a good default value of the block size (Diag.BlockSizeR and Diag.BlockSizeC). But, it is not so simple as I first thought to set the appropriate block size for the given # of MPI processes and the dimension of matrix. In addition, the appropriate size of block may strongly depend on the hardware.

Considering these situations, I wonder it is better to introduce the changes in the following 2 steps. Stage 1: we will collect the information from the users. Default: use the present CQ setting (without padding.) Option : we can set Diag.BlockSizeR for padding H and S.

Stage 2. we will provide the default value of the block size. Default: using a default setting of Diag.BlockSizeR with padding. if the users set the inappropriate # of processes, CQ warns -> changing BlockSize? Option: If user sets Diag.BlockSizeR, use the given value and just warning for inappropriate settings.

If anyone has a comment or suggestion, please let me know.

tsuyoshi38 commented 9 months ago

(( no. of processes, block size, dimension of the H and S matrices ))

  1. First, let me remind you that we have two sizes for the dimension of Hamiltonian and overlap matrices.

    • matrix_size = actual dimension of Hamiltonian
    • matrix_size_padH = size of padded H or S matrix, to be a multiple of the block size.
  2. Usually, proc_rows & proc_cols are determined from (no. of MPI processes). It is also possible for users to set these parameters by setting Diag.ProcRows and Diag.ProcCols .

  1. On the other hand, we want to set the default size of block_size_r (and c) in the future. But, the values can be also given by setting Diag.BlockSizeR and Diag.BlockSizeC. As mentioned above, CQ is very slow if block_size_r (and c) is set to be less than 5. If we assume, block_size_r (and c) is given by CQ or a user, number of block matrices along row or column is calculated.

    • block_size_r, blocks_size_c => blocks_r, blocks_c (no. of blocks along row or column)
  2. Her, we have a restriction;

    • blocks_r needs to be equal or larger than proc_rows
    • blocks_c needs to be equal or larger than proc_cols
  3. Of course, users should not set a large number of processes when the matrix size is not large. Then, we may be able to introduce a new rule or restriction.

    • proc_cols is equal to or larger than proc_rows.
    • proc_cols must be smaller than blocks_c = (matrix_size_padH/ block size)

For large systems, it should be okay. We usually use large number of MPI processes, then proc_rows is proportional to the square root of (no. of processes), while (matrix_size) is proportional to the number of atoms and (block size) should be almost constant.

On the other hand, it may cause a problem for small systems. The number of processes can be smaller than 9, and (matrix_size) and (block size) may be comparable. But... If we simply ignore the efficiency for small systems, it may be much easier to set the value of block size.

tsuyoshi38 commented 9 months ago

I have made a branch f-proj_PHM_BlockSize. Here, the subroutine checking the condition mentioned above (condition 4) is made and put just after matrix_size_padH is calculated.

At present, the part is in the subroutine allocate_arrays in ScalapackFormat.f90. But, It can be put also in readDiagInfo in initial_read_module.f90. I thought it is better if the initial_read_module is smaller, for the readability.

I think the code is now ready for Stage 1 and would like to put it into develop version. It is probably better to release v1.2 first and then merge this version to develop. (But.. I forgot how to merge the present version of f-proj_PadHamiltonianMatrix, which was made from the old version of develop, to the latest version of develop. )

And.. I think I can finish my project (Implementing Padding ...) for now. Then, we will restart it after we collect the data for appropriate block size.

davidbowler commented 9 months ago

I agree that we should release version 1.2 first so for now please don't try to merge this into develop.