Closed 1234zou closed 7 months ago
Thanks for pointing out the issue. The slow speed you observed is caused by the python for loops in dmrgscf/dmrgci.py
(not part of block2
) https://github.com/pyscf/dmrgscf/blob/master/pyscf/dmrgscf/dmrgci.py#L650-L659.
To solve this problem, you can avoid using DMRGCI.unpackE4_BLOCK
, by changing https://github.com/pyscf/dmrgscf/blob/master/pyscf/dmrgscf/dmrgci.py#L581 from
E4 = self.unpackE4_BLOCK(fname,norb)
to
E4 = numpy.fromfile(open(fname, 'rb'), offset=109, dtype=float).reshape((norb,) * 8).transpose(0, 1, 2, 3, 7, 6, 5, 4)
Great! I've tested a (16,16) job but with only 2 virtual orbitals, here are the time results:
before changing to numpy.fromfile
,
......production of RDMs took 61.00 sec
......reading the RDM took 26.78 sec
......production of RDMs took 1149.12 sec
Reading binary 4RDM from BLOCK
WARN: AT LEAST, NO MORE bytes TO READ!
......reading the RDM took 2104.67 sec
HMAT basis size = 8689 thrds = 1e-10
HMAT symm error = 0.4239904438
E(MRCI) - E(ref) = -0.0006146624653027288 DC = -2.088212758789092e-07
E(WickICMRCISD) = -7.720839854170862 E_corr_ci = -0.0006146624653027288
E(WickICMRCISD+Q) = -7.720840062992138 E_corr_ci = -0.0006148712865786078
after changing to numpy.fromfile
,
CASCI E = -7.72022521094335 E(CI) = -17.7988454430940 S^2 = 0.0000000
......production of RDMs took 58.22 sec
......reading the RDM took 29.03 sec
......production of RDMs took 1036.11 sec
Reading binary 4RDM from BLOCK
......reading the RDM took 9.29 sec
HMAT basis size = 8689 thrds = 1e-10
HMAT symm error = 0.4580074300
E(MRCI) - E(ref) = -0.0006152311737706029 DC = -2.092545138599727e-07
E(WickICMRCISD) = -7.720840442117125 E_corr_ci = -0.0006152311737706029
E(WickICMRCISD+Q) = -7.720840651371639 E_corr_ci = -0.0006154404282844629
It does save much time. However, the energy different between two jobs is 5.9e-7 a.u. Is this reasonable?
By the way, have you considered make a pull request to pyscf/dmrgscf for this modification? This is a small but important change.
However, the energy different between two jobs is 5.9e-7 a.u. Is this reasonable?
This is more likely caused by the differences in the HF/CASCI step, or loose DMRG convergence threshold. In fact, the CASCI energy in your two runs differ by 2E-8.
By the way, have you considered make a pull request to pyscf/dmrgscf for this modification?
Your help is greatly appreciated. Problem solved.
Hi, Huanchen (and other block2 developers),
I'm wondering what is the efficient way to run a DMRG-FIC-MRCISD job? I'm running a (16,16) job of a linear H16 chain, using the 3-21G basis set. Here is part of the input
Here the OpenMP-parallelism is used. After
Reading binary 4RDM from BLOCK
is printed, the program runs using only 1 CPU. Is the code of this step not properly OpenMP-parallelized? Or maybe I should switch to MPI-parallelism for DMRG-FIC-MRCISD?I note there is a remark in the online block2 documentation
I'm just curious that in the current situation whether there is any keyword to make the computation more efficient. Thanks a lot.