Open MatthewRHermes opened 1 year ago
Curiously, @valay1 had a segfault here,
2429 ******** <class 'mrh.my_pyscf.fci.csf.FCISolver'> ********
2430 max. cycles = 500
2431 conv_tol = 1e-10
2432 davidson only = False
2433 linear dependence = 1e-14
2434 level shift = 0.001
2435 max iter space = 12
2436 max_memory 512000 MB
2437 nroots = 1
2438 pspace_size = 200
2439 spin = None
2440 DFCASCI/DFCASSCF: density fitting for JK matrix and 2e integral transformation
2441 Start CASCI
2442 CPU time for integral transformation to CAS space 15.27 sec, wall time 0.96 sec
2443 CPU time for jk 17.37 sec, wall time 1.09 sec
2444 CPU time for jk 17.38 sec, wall time 1.09 sec
2445 CPU time for jk 17.29 sec, wall time 1.09 sec
2446 CPU time for jk 17.27 sec, wall time 1.09 sec
2447 CPU time for jk 17.29 sec, wall time 1.09 sec
2448 CPU time for jk 17.23 sec, wall time 1.08 sec
2449 CPU time for jk 17.36 sec, wall time 1.09 sec
2450 CPU time for jk 17.29 sec, wall time 1.09 sec
2451 CPU time for jk 17.29 sec, wall time 1.09 sec
2452 CPU time for jk 17.35 sec, wall time 1.09 sec
2453 CPU time for jk 17.41 sec, wall time 1.09 sec
2454 CPU time for jk 17.42 sec, wall time 1.10 sec
2455 CPU time for jk 17.31 sec, wall time 1.09 sec
2456 CPU time for jk 18.58 sec, wall time 1.26 sec
2457 CPU time for jk 17.38 sec, wall time 1.09 sec
2458 CPU time for jk 17.37 sec, wall time 1.09 sec
2459 CPU time for jk 17.32 sec, wall time 1.09 sec
2460 CPU time for jk 12.50 sec, wall time 0.79 sec
2461 CPU time for df vj and vk 308.40 sec, wall time 19.48 sec
2462 core energy = -4414.24210088297
2463 CPU time for effective h1e in CAS space 308.75 sec, wall time 19.51 sec
2464 CPU time for csf.kernel: throat-clearing 0.00 sec, wall time 0.00 sec
2465 CPU time for csf.kernel: hdiag_det 7.71 sec, wall time 0.75 sec
2466 CPU time for csf.kernel: hdiag_csf 731.38 sec, wall time 54.28 sec
2467 CPU time for csf.kernel: throat-clearing 2.89 sec, wall time 0.31 sec
2468 csf.pspace: Lowest-energy 200 CSFs correspond to 86 configurations which are spanned by 50388 determinants
2469 pspace_size of 200 CSFs -> 50388 determinants requires 24567.7991424 MB > 502162.878464 MB remaining memory
2470 CPU time for csf.pspace: index manipulation 1.85 sec, wall time 2.17 sec
~
which suggests that it is here
https://github.com/MatthewRHermes/mrh/blob/739a255533b7c03cab9001ca4f3cc6a1a0d2a915/my_pyscf/fci/csf.py#L284-L294
that a segfault is occuring (both timer_debug
lines should flush the buffer). That is a call to a standard PySCF library function which performs no allocations, so I am somewhat confused. A misallocated array could also cause a segfault here, but I can't for the life of me see how any of the input arrays could be misallocated.
ETA: it's worth pointing out in this scenario that we are demanding a far larger matrix (50,000 vs 200) than that PySCF library function was likely ever tested on building.
In two csf_solver functions (
mrh.my_pyscf.fci.csf.pspace
andmrh.my_pyscf.fci.csf.make_hdiag_csf
), the memory usage is problematic due to the two-step evaluation of Hamiltonian matrix elements in the CSF basis, which requires the (temporary) construction of arrays of size quadratic with respect to the number of corresponding determinants. Massively open-shell low-spin wave functions become impossible in the current implementation around (16e,16o) because the corresponding determinants are too numerous to store the block Hamiltonian in memory. The relevant arrays should be split into blocks handled sequentially, as is done for DFT quadrature. In the mean time, segfaults and calculations abruptly killed by cluster daemon processes are symptoms of unmanaged memory usage, and indicate a region of the code where a memory-checking step should be added.