Closed 1234zou closed 6 months ago
Hi, thanks for reporting the problem.
For the case when you get zero correlation energies, please provide (1) the complete Python script and (2) the complete output file with at least verbose = 4
for CASCI and verbose = 5
for ICMRCISD (as what is done in the documentation example).
OK. I re-submit the job with mol.verbose=5
, files are attached
DMRG_FIC_MRCISD_0_corr.tar.gz
Thanks for providing the files. I tested your Python script but got different outputs for the MRCISD part. As MRCISD was purely implemented using numpy
, I think this should be numpy
/Python environment related problems (rather than a problem of block2
compiling options). Part of my output is
CASCI E = -8.08398583892634 E(CI) = -18.162606071077 S^2 = 0.0000000
......production of RDMs took 59.26 sec
......reading the RDM took 23.70 sec
......production of RDMs took 803.10 sec
Reading binary 4RDM from BLOCK
......reading the RDM took 192.19 sec
diag overlap ref-ref-* size = 1
diag overlap ijrskltu* size = 0
diag overlap rsiatukp* size = 0
diag overlap ijrakltp* size = 0
diag overlap rsabtupq* size = 65536
diag overlap ijabklpq* size = 0
diag overlap irabktpq1 size = 0
diag overlap rabctpqg* size = 65536
diag overlap iabckpqg* size = 0
HMAT basis size = 98177 thrds = 1e-10
...
HMAT symm error = 0.9084697965
Notes:
HMAT basis size
is different from yours. The last step requires the full diagonalization of a 98177 x 98177
matrix (in the current implementation), which requires more than 500 GB memory so I did not get a correlation energy number (but it is very likely to be non-zero). I used Python 3.8.5 and numpy==1.24.4
.ncore == 0
, the HMAT basis size
number should be very close to 0.5 * nvirt ** 2 * nact ** 2 + nvirt * nact ** 3
. For this case, we have 256 * 256 / 2 + 16 * 4096 = 98304
which is close to 98177
in my output but far away from 12957
in your output. The relevant code is https://github.com/block-hczhai/block2-preview/blob/master/pyblock2/icmr/icmrcisd_full.py#L193-L198. Here I provide some data (based on your input) for you to check:# diag overlap rsabtupq* size = 65536
assert abs(np.linalg.norm(xn) - 485.73353755963177) < 1E-5
assert abs(np.linalg.norm(umats[key]) - 3300.4487602047843) < 1E-5
assert umats[key].shape == (65536, 32896)
# diag overlap rabctpqg* size = 65536
assert abs(np.linalg.norm(xn) - 558.281417562479) < 1E-5
assert abs(np.linalg.norm(umats[key]) - 53532.995978565734) < 1E-5
assert umats[key].shape == (65536, 65280)
Note that np.linalg.norm(umats[key])
may be numpy
implementation dependent and sensitive to small numerical errors. But the order of magnitude should roughly be the same. Please see if you can get a good number using a more recent numpy
version (or perform the calculation with another environment).
You said that "which requires more than 500 GB memory". But my computational node only has around 180GB memory (or only 180GB is allowed in my previous slurm script). So I change to a very large node (with 1.4 TB total memory allowed), and modify the python code in icmrcisd_full.py
as follows
lib.logger.debug(ic, 'diag overlap %s size = %d', key, len(xn))
w, v = np.linalg.eigh(xn)
idx = w > ic.mrci_thrds
umats[key] = v[:, idx] * (w[idx] ** (-0.5))
ntr += umats[key].shape[1] ### nothing changed until here
print(np.linalg.norm(xn))
print(np.linalg.norm(umats[key]))
print(umats[key].shape)
Here are the input and output files of this job (again, the correlation energy is zero). Python 3.9.13 and numpy==1.21.5
are used. I guess that the printed data in output may be helpful for you.
DMRG_FIC_MRCISD_large_mem.tar.gz
A computation using more recent numpy version are still running.
Thanks for the feedback.
You said that "which requires more than 500 GB memory". But my computational node only has around 180GB memory (or only 180GB is allowed in my previous slurm script).
What I said was "The last step requires the full diagonalization of a 98177 x 98177 matrix ... which requires more than 500 GB memory" In your output, the number in "HMAT basis size = 12957" was wrong, so you did not get the 98177 x 98177 matrix with the correct shape. Let's first solve the problem in the wrong "HMAT basis size" then one should worry about the memory requirement.
Your new output showed that the the norm of the overlap matrix xn
is probably correct, but umats[key]
should never be all zero. This seems to be a problem in numpy
. Please let me know what happens after you update the numpy
version.
I create a new virtual environment with python==3.9.18
and numpy==1.24.3
. There is no numpy==1.24.4
for me to choose in this virtual environment via conda install
. The array dimensions are still wrong, but with a slightly change
diag overlap ref-ref-* size = 1
1.0
1.0
(1, 1)
diag overlap ijrskltu* size = 0
0.0
0.0
(0, 0)
diag overlap rsiatukp* size = 0
0.0
0.0
(0, 0)
diag overlap ijrakltp* size = 0
0.0
0.0
(0, 0)
diag overlap rsabtupq* size = 65536
485.73658404811664
0.0
(65536, 0)
diag overlap ijabklpq* size = 0
0.0
0.0
(0, 0)
diag overlap irabktpq1 size = 0
0.0
0.0
(0, 0)
diag overlap rabctpqg* size = 65536
558.2859097711624
0.0
(65536, 12960)
diag overlap iabckpqg* size = 0
0.0
0.0
(0, 0)
HMAT basis size = 12961 thrds = 1e-10
hamil eq (ref-ref-*, ref-ref-*)
This time, the job is abnormally terminated without any error information. Here are the input and output files of this job py3_9_18_np1_24_3.tar.gz
I'll try other versions of python and numpy.
Thanks very much for the help with the testing.
In order to simplify the checking a little bit, could you run the following python script in your environment (anyone that you have used previously) and let me know the output?
You may consider two cases:
check_orthonormal
) of your Python script for MRCISD, to the beginning of the following script (before the import numpy as np, time
line), then run it. (This is to test the possible problem due to import order/runtime library conflicts).
import numpy as np, time
np.random.seed(1234)
n = 2 ** 16
mrci_thrds = 1E-10
wmat = np.random.random(n)
wmat[:n // 2] *= mrci_thrds
wmat[n // 2:] += mrci_thrds
np.random.shuffle(wmat)
pmat = np.random.random((n, n))
umat = np.linalg.qr(pmat)[0]
xmat = (umat * wmat[None]) @ umat.T
t1 = time.perf_counter()
w, v = np.linalg.eigh(xmat)
t2 = time.perf_counter()
idx = w > mrci_thrds
u = v[:, idx] * (w[idx] ** (-0.5))
t3 = time.perf_counter()
print('psum = ', np.sum(pmat))
print('pnorm = ', np.linalg.norm(pmat))
print('pcubic = ', np.sum(pmat ** 3))
print('xsum = ', np.sum(xmat))
print('xnorm = ', np.linalg.norm(xmat))
print('xcubic = ', np.sum(xmat ** 3))
print('vsum = ', np.sum(v))
print('vnorm = ', np.linalg.norm(v))
print('vcubic = ', np.sum(v ** 3))
print('ushape = ', u.shape)
print('usum = ', np.sum(u))
print('unorm = ', np.linalg.norm(u))
print('ucubic = ', np.sum(u ** 3))
print('wnorm = ', np.linalg.norm(w))
print(w[:4], '\n', w[-4:], '\n', idx[:10], '\n', idx[-10:])
print('time = ', t2 - t1, t3 - t2)
The above script used random matrices, which may not match the actual situation. The following script can exactly reproduce the data that you will have in DMRG-FIC-MRCISD (only for the diag overlap rsabtupq* size = 65536
part). The only required input is the 2pdm file 06-twopdm.txt
(attached: 06-twopdm.txt).
Please use the following script for testing instead of the one provided in the previous reply.
import numpy as np, time
nact, nvirt, nelec = 16, 16, 16
mrci_thrds = 1E-10
deltaAA = np.eye(nact)
deltaEE = np.eye(nvirt)
E2 = np.zeros((nact, nact, nact, nact), order='C')
with open("06-twopdm.txt", "r") as f:
E2.reshape(-1)[:] = [float(l.strip().split()[-1]) for l in f.readlines()[1:]]
E2 = np.ascontiguousarray(2.0 * E2.transpose(0, 1, 3, 2))
xmat = np.zeros((nvirt, nvirt, nact, nact, nvirt, nvirt, nact, nact))
xmat += np.einsum('ur,ts,pqba->rsabtupq', deltaEE, deltaEE, E2, optimize=True)
xmat += np.einsum('us,tr,pqab->rsabtupq', deltaEE, deltaEE, E2, optimize=True)
xmat = xmat.reshape((-1, nvirt * nvirt * nact * nact))
print('xsum = ', np.sum(xmat))
print('xnorm = ', np.linalg.norm(xmat))
print('xcubic = ', np.sum(xmat ** 3))
t1 = time.perf_counter()
w, v = np.linalg.eigh(xmat)
t2 = time.perf_counter()
idx = w > mrci_thrds
u = v[:, idx] * (w[idx] ** (-0.5))
t3 = time.perf_counter()
print('vsum = ', np.sum(v))
print('vnorm = ', np.linalg.norm(v))
print('vcubic = ', np.sum(v ** 3))
print('ushape = ', u.shape)
print('usum = ', np.sum(u))
print('unorm = ', np.linalg.norm(u))
print('ucubic = ', np.sum(u ** 3))
print('wnorm = ', np.linalg.norm(w))
print(w[:4], '\n', w[-4:], '\n', idx[:10], '\n', idx[-10:])
print('time = ', t2 - t1, t3 - t2)
The reference correct output for the above:
xsum = 74028.55313691219
xnorm = 485.73353755963234
xcubic = 277500.4970301131
vsum = 243.72961500908983
vnorm = 255.99999999999997
vcubic = 9.539483696360193
ushape = (65536, 32896)
usum = -5612.461844063661
unorm = 3300.4487602465124
ucubic = -51208.796133034106
wnorm = 485.73353755963313
[-1.70702535e-14 -1.67275979e-14 -1.67064416e-14 -1.66885439e-14]
[8.13188693 8.13188693 8.13188693 8.13188693]
[False False False False False False False False False False]
[ True True True True True True True True True True]
time = 4349.1460023853 27.50447132764384
Here is the output using the Python script and 06-twopdm.txt you provided:
xsum = 74028.55313691219
xnorm = 485.7335375596325
xcubic = 277500.4970301131
vsum = 0.0
vnorm = 0.0
vcubic = 0.0
ushape = (65536, 32843)
usum = 0.0
unorm = 0.0
ucubic = 0.0
wnorm = 15.189672259130365
[2.34983399e-310 2.27922443e-316 2.27922443e-316 2.27922443e-316]
[-6.76551238e-06 1.04130488e-04 1.16992676e-04 -5.20583656e-06]
[False False False False False False True True False False]
[ True False True True False True False True True False]
time = 0.0046670830342918634 9.880773521959782
It can be seen that our results differ since the vsum
term. This time I use python=3.9.13
and numpy==1.21.5
. What else can I do to help debug?
Thanks for the feedback. Your output shows that the data in xmat
is correct. But the eigenvalue and eigenvector w
and v
are not computed correctly. So it is clear that the problem happens in the line w, v = np.linalg.eigh(xmat)
. As the script only uses numpy
, we can confirm that the problem is in your numpy
installation. Here are some suggestions:
numpy
using pip
rather than conda
. If you want to install it offline, you can go to https://pypi.org/project/numpy/#files, download the file named numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
, and then in the computational node run pip install numpy-1.26.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
to install it (after you put the whl
file in the current directory).numpy
cannot diagonalize matrices with size 65536 x 65536
or bigger. You can decrease the number of nvirt
. I guess for nvirt < 16
it will work.block2
, you can change w, v = np.linalg.eigh(xmat)
to the following to circumvent the problem in numpy.linalg.eigh
.
import block2
w, v = np.zeros(len(xmat)), np.copy(xmat)
block2.MatrixFunctions.eigs(v, w)
v = v.T
np.linalg.eigh
to scipy.linalg.eigh
and see if it can work.If 3 or 4 works ("works" means that you get non-zero vsum
, vnorm
, etc. Comparing to my output, only vnorm
, unorm
, wnorm
, ushape
should match. Because of the degeneracy in eigenvalues, other printed quantities can differ when you use different implementation of the eigensystem solver. So they do not have to match, as long as they are both non-zero), you can do the corresponding change in the DMRG-FIC-MRCISD code (note that there are two calls to np.linalg.eigh
), which should solve the problem.
It seems that if you are using the Intel Distribution for Python, the problem you see is related this: https://community.intel.com/t5/Intel-Distribution-for-Python/numpy-linalg-eigh-fails-for-large-matrices/m-p/1540256. The page shows that the last reply was in 01-25-2024, so I guess they have not got a chance to release the fix. Beside the suggestions in the above reply, you can switch to the normal Python implementation, which does not have this bug.
I use Anaconda Python 3 (a large offline package downloaded from Anaconda website or its mirror), and it already contains numpy
, scipy
, etc. Adding checking words into the slurm script
python prt_np_version.py
which python
python --version
it can be found that
1.21.5
/public/home/jxzou/software/anaconda3/bin/python
Python 3.9.13
So it seems the Anaconda Python is indeed used. I'll create a new virtual environment (with the same python and scipy) and use pip install
to install numpy-1.26.4
, to see if there is something different.
I've made some test for the 06-twopdm.txt case
vsum | |
---|---|
np 1.21 from conda | 0.0 |
np 1.26 from conda | 0.0 |
np 1.26 from pip | 413.2 |
scipy instead | running |
I suspect this could be a similar issue to numpy/numpy#22590. But I can't explain why numpy from pip is different (because openblas is different from MKL?).
So it seems the Anaconda Python is indeed used.
To check whether the Python is Intel or Anaconda, one can start the Python interpreter interactively in the terminal. If it prints the following it is Anaconda Python (the Python/GCC version and the date are not important):
Python 3.8.18 (default, Sep 11 2023, 13:40:15)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
If it prints the following it is Intel Python:
Python 3.9.16 (default, Jun 4 2021, 06:52:02)
[GCC 9.3.0] :: Intel Corporation on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Note that even if you have Anaconda Python, you may end up with an Intel implementation of numpy
(which contains the bug) when you do conda install numpy
or use the default numpy
(see below).
But I can't explain why
numpy
from pip is different (becauseopenblas
is different fromMKL
?).
The bug is in the Intel implementation of numpy
, not in openblas
or MKL
. This page https://anaconda.cloud/intel-optimized-packages explains that when you do conda install numpy
(the defaults channel), you are actually installing the Intel implementation of numpy
. The pip
package has nothing to do with Intel, so pip install numpy
will be good. You may also compile numpy
from source (any version), using either MKL
or openblas
as the backend, it will also be good.
Update: use scipy from conda also gives right result but takes 5x more time. The reason may be scipy.linalg.eigh uses dsyevr solver which is more robust but slower (compared to numpy's dsyevd solver).
The bug is in the Intel implementation of numpy, not in openblas or MKL. This page https://anaconda.cloud/intel-optimized-packages explains that when you do conda install numpy (the defaults channel), you are actually installing the Intel implementation of numpy.
Thanks for the information. So yes numpy from pip should be a good solution in this issue. Let's see if this solves @1234zou 's original question.
To check whether the Python is Intel or Anaconda, one can start the Python interpreter interactively in the terminal.
I tried in the corresponding virtual environment and it is Anaconda Python
Python 3.9.13 (main, Oct 13 2022, 21:15:33)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
But as you said, the conda install numpy
can still be Intel implementation. This time I use pip install
to install various versions of numpy
in their respective virtual environments. I find pip install
does some help, but it also depends on the version of numpy
. Here are results (with the same python 3.9.13, no scipy installed)
numpy version | vsum |
---|---|
1.21.4 | Segmentation fault |
1.21.5 | Segmentation fault |
1.21.6 | Segmentation fault |
1.22.0 | 235.484 |
1.23.0 | 235.484 |
1.24.0 | 243.730 |
1.24.3 | 243.730 |
1.24.4 | 243.730 |
1.25.0 | 235.484 |
1.26.0 | 235.484 |
Here are input and output files using numpy==1.21.6
(Just for taking a look at what happened. There is no need to solve this problem under this old version of numpy). np1_21_6.zip
I do not know whether the numerical value of vsum
can be different for different versions of numpy
. According to @hebrewsnabla 's result, scipy
is much slower so I do not try that. Current conclusion may be: use pip install
and numpy>=1.22.0
.
Thanks for the update.
I find
pip install
does some help, but it also depends on the version ofnumpy
.
I can confirm that numpy==1.21.6
does not work for this problem. The reason was explained in the release note of numpy==1.22.0
: https://github.com/numpy/numpy/releases/tag/v1.22.0. In the end of the first paragraph you can see "All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays." This means before version 1.22.0
, the pip numpy
package was built with 32 bit integer OpenBLAS which cannot handle arrays larger than 32768 x 32768
(actually for numpy.linalg.eigh
32-bit is already enough, but the segfault happens when you do numpy.linalg.norm
for the big array).
I do not know whether the numerical value of
vsum
can be different for different versions ofnumpy
.
It is okay to see different vsum
. Only wnorm
and ushape
should match exactly (up to small numerical errors). (vnorm
and unorm
may also match, but strictly speaking they do not have to.)
Please go ahead to redo the DMRG-FIC-MRCISD calculation and let me know whether the original problem can be solved. Remember that the whole calculation will need more than 500 GB memory because the last step requires the full diagonalization of a 98177 x 98177 matrix to get the correlation energy.
With the correct installation of numpy
, the DMRG-FIC-MRCISD calculation with CAS(16o, 16e) should print the following results:
HMAT basis size = 98177 thrds = 1e-10
HMAT symm error = 0.8607613970
E(MRCI) - E(ref) = -0.0059030037211798 DC = -1.916902656368702e-05
E(WickICMRCISD) = -8.089888814864775 E_corr_ci = -0.0059030037211798
E(WickICMRCISD+Q) = -8.089907983891338 E_corr_ci = -0.005922172747743488
As this does not seem to be a problem of block2
, I will close this issue.
Dear Huanchen,
I've performed a DMRG-FIC-MRCISD calculation for the linear H16 chain, using the 3-21G basis set. This system has 0 core orbitals, 16 active orbitals, and 16 virtual orbitals, the obtained correlation energy is zero
However, I calculated a smaller system using only 2 virtual orbitals, the result seems normal
My guess is that I did not add
-DLARGE_BOND=ON
when compiling block2 so that some integers are beyond the range. Is there any criterion to determine/judge that-DLARGE_BOND=ON
is needed for large active space like (16,16)?Best, Jingxiang