block-hczhai / block2-preview

Efficient parallel quantum chemistry DMRG in MPO formalism
GNU General Public License v3.0
67 stars 23 forks source link

Encountered some problems in actual calculations #61

Closed ChemChuan closed 12 months ago

ChemChuan commented 1 year ago

I hope this message finds you well. I have been utilizing Block2-MPI for my research in DMRG calculations and have encountered some challenges during the process. I would greatly appreciate your assistance in addressing the following questions:

  1. When performing calculations with a large bond dimension, I have observed significant increases in memory and disk usage. For instance, DMRG (64,64) calculations with 1800 bond dimension can require up to 700 GB of memory and 3-4 T of hard disk resources. Do you have any recommendations on how to handle such calculations more efficiently?
  2. Have there been any DMRG calculations carried out on two-dimensional hydrogen systems (N * N)? If so, what is the recommended number of bond dimensions for such systems?
  3. I have encountered notable differences in the results when using the same input file to compute a three-dimensional hydrogen system (4 x 4 x 3) with Block1.5 and Block2.0. Could you please explain the possible reasons for these discrepancies?
  4. Is there a significant variance in computational speed when using Block2 on AMD and Intel CPUs? I am interested in learning about any performance disparities between these two platforms.

I appreciate your time and assistance tremendously. I look forward to receiving your valuable insights and suggestions to enhance the efficiency of my Block2 calculations.

hczhai commented 1 year ago

Thanks for your interest in using the block2 package.

  1. For DMRG (64,64) do you mean: (a) You have 4096 (=64 x 64) sites in a 2D square lattice? For this case, it is known that DMRG is not going to work for large scale 2D system. Even if you can run such calculations with some large amount of resources, it may not be able to generate meaningful results. or (b) You have 64 electrons in 64 spatial orbitals (sites)? For this system with bond dimension 1800, I do not think you need 700 GB memory or 3 TB disk. If possible, could you please attach an example input and output file for this calculation so that I can get some concrete information about your settings and tell you how to improve it?
  2. I do not have any personal experience on the system you mentioned. The required bond dimension can be system dependent, so I suggest you follow the documentation, do some tests on your system using different MPS bond dimensions and see how far you are from the extrapolated result.
  3. Block 1.5 and block2 are different packages, sharing no common code. Even if you use the same input file, the difference in the implementation details in the two packages can make the results differ. If possible, please attach the input file and the outputs from the two package so that I can get some concrete information to answer your question.
  4. We do not expect a great performance difference related to the two CPU brands, as long as they work equally well for other general computational tasks. The performance of any code can be affected by many aspects, including the frequency of the CPU, the cache size, the instruction set, the memory speed, the disk speed, and the supercomputer grid and its configuration. Without quantized information and/or the testing data on the actual computing environment, it is hard for me to provide meaningful recommendations.
ChemChuan commented 1 year ago

Thank you very much for your reply

  1. I calculate 8 x 8 hydogens in a 2D square lattice using 64 electrons in 64 spatial orbitals. Can DMRG work for such large scale 2D system. Or what kind of system is considered a two-dimensional system that DMRG cannot handle, such as N x N hydogens in a 2D square lattice (N > 6, 8, 10...)?
  2. The input and output files for this task are H64_STO-6G_2.0A_88_uhf_uno_asrot2gvb32_s_CASCI_1800.py and H64_STO-6G_2.0A_88_uhf_uno_asrot2gvb32_s_CASCI_180.out. I use block2-mpi version using 32 cores CPU, which need about 600-700 GB memory and about 2-3 TB disk.
  3. I have encountered notable differences in the results when using the same input file to compute a three-dimensional (4 x 4 x 3) hydrogen system with 48 electrons in 48 spatial orbitals using bond dimension = 500 . These two different input file have the same input content H48_2.0A_cc-pVDZ_uhf_frag_uno_asrot2gvb24_s_CASCI_500_block2.py ---> CASCI E = -23.6525789053700 H48_2.0A_cc-pVDZ_uhf_frag_uno_asrot2gvb24_s_CASCI_500_block1.5.py ---> CASCI E = -23.9621772507256
  4. I use GVB natural orbital to calculate DMRG, generate and transmit GVB natural orbital through program mokit https://gitlab.com/jxzou/mokit and the corresponding fch files are also in the compressed package.

H48_2.0A_cc-pVDZ_uhf_frag_uno_asrot2gvb24_s_CASCI_500.zip H64_STO-6G_2.0A_88_uhf_uno_asrot2gvb32_s_CASCI_1800.zip

hczhai commented 1 year ago

Thanks for providing the script and output files.

  1. 8 x 8 2D lattice should be okay for DMRG, although it may be hard to get very accurate results.
  2. Since you used the dmrgscf interface provided by pyscf, which implicitly invokes block2 or block1.5, while your zip files only contain the output from pyscf, the actual DMRG output is missing. You can find the actual DMRG output under the scratch file created by pyscf. In the pyscf output file, you can find lines like scratchDirectory = /scratch/xcren/pyscf/165842 and outputFile = 165842/dmrg.out, etc. Please attach the dmrg.out file in these directories (for each case) so that we can have a look at the DMRG output.
  3. For the three-dimensional (4 x 4 x 3), the bond dimension 500 may not be sufficient. So it is likely that both the block2 and block1.5 results are inaccurate, since they are far from convergence. But we still need to see dmrg.out to confirm this.
  4. Since you are running CASCI, it is recommended to follow the procedure in https://block2.readthedocs.io/en/latest/tutorial/qc-hamiltonians.html to run block2 without using the dmrgscf interface, and follow https://block2.readthedocs.io/en/latest/tutorial/energy-extrapolation.html to get an estimate of the error for your bond dimension and then you can know how reliable the DMRG result is and whether you need to use larger bond dimension.
  5. If you decided to use the dmrgscf interface, you need to follow the block2 documentation https://block2.readthedocs.io/en/latest/user/dmrg-scf.html to set the CASCI part of your script. For example, when you are running the calculation in only one node, there is no need to use mpirun -n .... Instead, you can simply set the number of threads equal to the number of cores in your node, so that the shared-memory parallelization can be used.
hczhai commented 1 year ago

Also in your input python script, we can see

mc.max_memory = 602400 # MB
mc.fcisolver = dmrgscf.DMRGCI(mol, maxM=1800)
mc.fcisolver.memory = 10 # GB

From the documentation https://block2.readthedocs.io/en/latest/user/dmrg-scf.html#dmrgscf-serial it should be clear that the memory for DMRG is set via mc.fcisolver.memory, which is only 10 GB. For the same reason, it is likely that the 4TB disk is not used by block2. The mc.max_memory is the attribute defined and used in the pyscf package.

ChemChuan commented 1 year ago

Thank you very much for your reply

  1. Some dmrg.out files may have been cleared by me. I can use block1.5 and block2.0 to rerun the 3D (4 x 4 x 3) system and upload the results once they are available.
  2. I will try Energy Extrapolation without using the dmrgscf. For 8 x 8 or larger 2D lattice , this extrapolation still be a reliable method?
  3. For example, I can use such as 250, 500, 750, 1000, 1250, 1500 to extrapolate the limit, but the limit result may be not reliable. This may require larger bond dimension for extrapolation. Is my understanding correct?
  4. Regarding the issue of hard disk resources, I will test it again. Currently, I have calculated 8 x 8 2D lattice with 2000 bond dimension using mpirun -n 32. It has been calculated for three days and currently occupies 800 GB disk in the temporary file directory(/scratch/pyscf/30048), and this task is still running. Now free -h ---> show used 246GB memory, this memory proportion is constantly changing and will become larger after calculation.

30048.zip

hczhai commented 1 year ago

For 8 x 8 or larger 2D lattice , this extrapolation still be a reliable method? ... This may require larger bond dimension for extrapolation. Is my understanding correct?

You can read the following paper to get a better understanding of the extrapolation approach:

Olivares-Amaya, R.; Hu, W.; Nakatani, N.; Sharma, S.; Yang, J.; Chan, G. K.-L. The ab-initio density matrix renormalization group in practice. The Journal of Chemical Physics 2015, 142, 034102. doi: 10.1063/1.4905329

Now free -h ---> show used 246GB memory, this memory proportion is constantly changing and will become larger after calculation.

To get the best efficiency it is important to read and follow the block2 documentation. Please cancel this calculation, delete the scratch files, and then restart the calculation without mpirun (when you are using just one node, we do not need any MPI parallelization), following the script given in https://block2.readthedocs.io/en/latest/user/dmrg-scf.html#dmrgscf-serial. In particular, for your case set

dmrgscf.settings.MPIPREFIX = ''
mc.fcisolver.threads = 32
mc.fcisolver.memory = 100 # mem in GB

Then the memory and disk cost will greatly decrease. The computational speed will also increase.

For example, I can use such as 250, 500, 750, 1000, 1250, 1500 to extrapolate the limit

For energy extrapolation you need to do the reverse schedule and the smallest bond dimension should not be too small. Please read the above paper and the documentation https://block2.readthedocs.io/en/latest/tutorial/energy-extrapolation.html#The-Reverse-Schedule carefully.

ChemChuan commented 1 year ago

Thank you very much for your reply and suggestions. I will carefully read the above paper and documents. I understand what you mean. I am indeed calculating on a node, and do not need MPI parallelization. I will restart this task.

ChemChuan commented 1 year ago

Hi, I read the paper and have a question. For arenes, how to obtain the split-localized orbitals (do RHF ---> PM localization ?) and fully-localized orbitals ? Because I want to test this system using pyscf and block2.

Olivares-Amaya, R.; Hu, W.; Nakatani, N.; Sharma, S.; Yang, J.; Chan, G. K.-L. The ab-initio density matrix renormalization group in practice. The Journal of Chemical Physics 2015, 142, 034102. doi: 10.1063/1.4905329

hczhai commented 1 year ago

The orbital localization can be done using pyscf. Please have a look at the pyscf documentation https://pyscf.org/user/lo.html. Example scripts can be found in pyscf issues, such as https://github.com/pyscf/pyscf/issues/1892. If you have further questions regarding the usage of pyscf, you may search and/or post issues in the pyscf repo.

ChemChuan commented 1 year ago

Yes, I know that pyscf has these different localization functions.
Is the split-localized orbitals in the paper only obtained through PM localization using RHF? RHF orbials ---> PM localization ---> split-localized orbitals, Is it such a process?

hczhai commented 1 year ago

Split-localization using PM simply means you do PM localization for occupied orbitals and then PM localization for virtual orbitals and then combine the two sets of localized orbitals, as shown in the script in https://github.com/pyscf/pyscf/issues/1892 that I mentioned previously.