Closed MihkuU closed 10 months ago
Hi, thanks for your interest in using block2
.
You can find the explanation of the restart_nevpt2_npdm
keyword in https://block2.readthedocs.io/en/latest/user/keywords.html. The required input file and output format are exactly the same as those in the old version of Block
. This is supported when you run block2
using the block2main
executable. See https://block2.readthedocs.io/en/latest/user/basic.html.
Thank you for the answer.
In principle we could install block2main separately with pip and call it from our program to do the DMRG calculation. The problem however is that, we would like to do so in running calculations across multiple nodes. That means, both the executables would have to be called by the queueing system. We want to call Block2 conditionally and from our main program to run over multiple nodes. I don't know how to do that. It would be much more straightforward to link the Block2 'libblock2.so' file and call the Block2 functions from our program. Then multinode parallelization is granted automatically. So I am wondering, if there any plans to add the NEVPT2 capabilities to the Block2 C++ interface?
Thanks for providing the detailed explanation.
You can use the following code to compute the NEVPT2 intermediates (3pdm, A16, and A22) using only the C++ interface of block2
(with MPI supported):
#include "block2.hpp"
int main() {
using FL = double;
using S = block2::SU2;
// Initalize DMRGDriver
const size_t stack_mem = 1ULL << 30; // 1 GB
const std::string scratch = "./tmp";
const int n_threads = 4;
std::shared_ptr<block2::DMRGDriver<S, FL>> driver =
std::make_shared<block2::DMRGDriver<S, FL>>(stack_mem, scratch, "", n_threads);
// Set MPI
#ifdef _HAS_MPI
std::shared_ptr<block2::ParallelCommunicator<S>> para_comm = std::make_shared<block2::MPICommunicator<S>>();
#else
std::shared_ptr<block2::ParallelCommunicator<S>> para_comm =
std::make_shared<block2::ParallelCommunicator<S>>(1, 0, 0);
#endif
std::shared_ptr<block2::ParallelRuleSimple<S, FL>> prule =
std::make_shared<block2::ParallelRuleSimple<S, FL>>(block2::ParallelSimpleTypes::IJ, para_comm);
driver->prule = prule;
// Load Integrals (active space)
std::shared_ptr<block2::FCIDUMP<FL>> fd = std::make_shared<block2::FCIDUMP<FL>>();
fd->read("O2.ACTIVE.FCIDUMP");
std::shared_ptr<block2::ParallelFCIDUMP<S, FL>> pfd = std::make_shared<block2::ParallelFCIDUMP<S, FL>>(fd, prule);
std::shared_ptr<block2::GeneralFCIDUMP<FL>> expr =
block2::GeneralFCIDUMP<FL>::initialize_from_qc(pfd, block2::ElemOpTypes::SU2, 1E-14)->adjust_order();
// Sweep Schedule
const std::vector<block2::ubond_t> bond_dims = {250, 250, 500, 500};
const std::vector<FL> noises = {1E-4, 1E-5, 0.0};
const std::vector<FL> thrds = {1E-7, 1E-8, 1E-9, 1E-10};
const int n_sweeps = 10;
const double sweep_tol = 1E-8;
const int verbose = 2;
// MPO & MPS
driver->initialize_system(expr->n_sites(), expr->n_elec(), expr->twos());
std::shared_ptr<block2::MPO<S, FL>> mpo =
driver->get_mpo(expr, verbose, 1E-14, block2::MPOAlgorithmTypes::FastBipartite);
std::shared_ptr<block2::MPS<S, FL>> ket = driver->get_random_mps("KET", bond_dims[0]);
// DMRG & NPDM
const std::vector<std::string> npdm_exprs = {"((C+((C+(C+D)0)1+D)0)1+D)0", "((C+D)0+(C+D)0)0",
"((C+D)0+((C+D)0+(C+D)0)0)0", "((C+D)0+((C+D)0+((C+D)0+(C+D)0)0)0)0"};
const block2::ExpectationAlgorithmTypes npdm_algo_type =
block2::ExpectationAlgorithmTypes::SymbolFree | block2::ExpectationAlgorithmTypes::Compressed;
driver->dmrg(mpo, ket, n_sweeps, sweep_tol, bond_dims, noises, thrds, verbose);
std::vector<std::shared_ptr<block2::GTensor<FL>>> npdms =
driver->get_npdm(npdm_exprs, ket, ket, 0, npdm_algo_type, verbose);
if (prule->is_root()) {
const MKL_INT n_orbs = ket->n_sites;
auto write_array = [n_orbs](const std::string &fn, block2::NDArray arr) {
std::ofstream ofs(fn.c_str());
ofs << n_orbs << std::endl;
for (size_t i = 0; i < arr.size(); i++) {
for (auto idx : arr.decompose_linear_index(i))
ofs << std::setw(2) << idx << " ";
ofs << std::setw(20) << std::fixed << std::setprecision(14) << arr.data[arr.linear_index(i)]
<< std::endl;
}
ofs.close();
};
block2::NDArray h1e = block2::NDArray(std::vector<MKL_INT>(2, n_orbs), fd->h1e_matrix());
block2::NDArray g2e = block2::NDArray(std::vector<MKL_INT>(4, n_orbs), fd->g2e_1fold()).transpose({0, 2, 1, 3});
block2::NDArray dm3_x = block2::NDArray(npdms[0]->shape, *npdms[0]->data); // dm3 in stackblock format
block2::NDArray dm2 = block2::NDArray(npdms[1]->shape, *npdms[1]->data);
block2::NDArray dm3 = block2::NDArray(npdms[2]->shape, *npdms[2]->data);
block2::NDArray dm4 = block2::NDArray(npdms[3]->shape, *npdms[3]->data);
npdms = std::vector<std::shared_ptr<block2::GTensor<FL>>>();
// Write 3PDM in StackBlock format
write_array("spatial_threepdm.0.0.txt", dm3_x);
dm3_x = block2::NDArray();
// Compute A16
block2::NDArray f3ac = block2::NDArray::einsum("ijka,rpqbjcik->rpqbac", {g2e, dm4});
block2::NDArray f3ca = block2::NDArray::einsum("kcij,rpqbajki->rpqbac", {g2e, dm4});
dm4 = block2::NDArray();
block2::NDArray a16 = -block2::NDArray::einsum("ib,rpqiac->pqrabc", {h1e, dm3});
a16 += block2::NDArray::einsum("ia,rpqbic->pqrabc", {h1e, dm3});
a16 -= block2::NDArray::einsum("ci,rpqbai->pqrabc", {h1e, dm3});
a16 -= f3ca.transpose({1, 4, 0, 2, 5, 3});
a16 -= block2::NDArray::einsum("kbia,rpqcki->pqrabc", {g2e, dm3});
a16 -= block2::NDArray::einsum("kbaj,rpqjkc->pqrabc", {g2e, dm3});
a16 += block2::NDArray::einsum("cbij,rpqjai->pqrabc", {g2e, dm3});
block2::NDArray fdm2 = block2::NDArray::einsum("kbij,rpajki->prab", {g2e, dm3});
std::vector<block2::NDArraySlice> sl(6, block2::NDArraySlice());
for (MKL_INT k = 0; k < n_orbs; k++) {
sl[1] = sl[5] = block2::NDArraySlice(k, -1, 0);
a16.slice(sl) += fdm2;
}
a16 += f3ac.transpose({1, 2, 0, 4, 3, 5});
a16 -= f3ca.transpose({1, 2, 0, 4, 3, 5});
a16 += block2::NDArray::einsum("jbij,rpqiac->pqrabc", {g2e, dm3});
a16 -= block2::NDArray::einsum("cjka,rpqbjk->pqrabc", {g2e, dm3});
a16 += block2::NDArray::einsum("jcij,rpqbai->pqrabc", {g2e, dm3});
// Write A16 in StackBlock format
write_array("A16_matrix.0.0.txt", a16);
a16 = block2::NDArray();
// Compute A22
block2::NDArray a22 = -block2::NDArray::einsum("pb,kipjac->ijkabc", {h1e, dm3});
a22 -= block2::NDArray::einsum("pa,kibjpc->ijkabc", {h1e, dm3});
a22 += block2::NDArray::einsum("cp,kibjap->ijkabc", {h1e, dm3});
a22 += block2::NDArray::einsum("cqra,kibjqr->ijkabc", {g2e, dm3});
a22 -= block2::NDArray::einsum("qcpq,kibjap->ijkabc", {g2e, dm3});
a22 -= f3ac.transpose({1, 5, 0, 2, 4, 3});
fdm2 = block2::NDArray::einsum("pqrb,kiqcpr->ikbc", {g2e, dm3});
sl = std::vector<block2::NDArraySlice>(6, block2::NDArraySlice());
for (MKL_INT k = 0; k < n_orbs; k++) {
sl[1] = sl[3] = block2::NDArraySlice(k, -1, 0);
a22.slice(sl) -= fdm2;
}
a22 -= block2::NDArray::einsum("pqab,kiqjpc->ijkabc", {g2e, dm3});
a22 += block2::NDArray::einsum("pcrb,kiajpr->ijkabc", {g2e, dm3});
a22 += block2::NDArray::einsum("cqrb,kiqjar->ijkabc", {g2e, dm3});
a22 -= f3ac.transpose({1, 3, 0, 4, 2, 5});
a22 += f3ca.transpose({1, 3, 0, 4, 2, 5});
a22 += block2::NDArray::einsum("jb,kiac->ijkabc", {h1e, dm2}) * 2.0;
a22 += block2::NDArray::einsum("pjrb,kiprac->ijkabc", {g2e, dm3}) * 2.0;
fdm2 = block2::NDArray::einsum("pa,kipc->ikac", {h1e, dm2});
fdm2 -= block2::NDArray::einsum("cp,kiap->ikac", {h1e, dm2});
fdm2 -= block2::NDArray::einsum("cqra,kiqr->ikac", {g2e, dm2});
fdm2 += block2::NDArray::einsum("qcpq,kiap->ikac", {g2e, dm2});
fdm2 += block2::NDArray::einsum("pqra,kiqcpr->ikac", {g2e, dm3});
fdm2 -= block2::NDArray::einsum("rcpq,kiaqrp->ikac", {g2e, dm3});
sl = std::vector<block2::NDArraySlice>(6, block2::NDArraySlice());
for (MKL_INT k = 0; k < n_orbs; k++) {
sl[1] = sl[4] = block2::NDArraySlice(k, -1, 0);
a22.slice(sl) += fdm2 * 2.0;
}
// Write A22 in StackBlock format
write_array("A22_matrix.0.0.txt", a22);
a22 = block2::NDArray();
}
// Release memory
driver = nullptr;
return 0;
}
To link this code with libblock2.so
, please follow the following steps:
libblock2.so
library. A most recent source code of block2
with commit https://github.com/block-hczhai/block2-preview/commit/d61d4610aa9e527fbf89953f7870005dcf9fd8e5 is required. Build it withcmake .. -DUSE_MKL=ON -DBUILD_CLIB=ON -DMPI=ON -DLARGE_BOND=ON-DUSE_IC=ON -DCMAKE_INSTALL_PREFIX=<INSTALL_DIR>
make
make install
where <INSTALL_DIR>
is a folder for storing the include, cmake config, and library files for libblock2.so
.
main.cpp
, create the following CMakeLists.txt
:CMAKE_MINIMUM_REQUIRED(VERSION 3.0)
FIND_PROGRAM(CMAKE_C_COMPILER NAMES $ENV{CC} gcc PATHS ENV PATH NO_DEFAULT_PATH)
FIND_PROGRAM(CMAKE_CXX_COMPILER NAMES $ENV{CXX} g++ PATHS ENV PATH NO_DEFAULT_PATH)
PROJECT(main VERSION 1.0 LANGUAGES CXX)
FIND_PACKAGE(block2 2.0 CONFIG REQUIRED)
FIND_PACKAGE(MPI)
# FIND MKL LIBS - must match those used when building block2
SET(CMAKE_FIND_LIBRARY_SUFFIXES_BKP ${CMAKE_FIND_LIBRARY_SUFFIXES})
SET(CMAKE_FIND_LIBRARY_SUFFIXES "${CMAKE_FIND_LIBRARY_SUFFIXES_BKP};.so.1;.1.dylib;.so.2;.2.dylib;.so.3;.3.dylib;.so.4;.4.dylib")
FIND_LIBRARY(MKL_LIB_LP NAMES mkl_intel_lp64 PATHS $ENV{MKLROOT}/lib $ENV{MKLROOT}/lib/intel64 /usr/local/lib)
FIND_LIBRARY(MKL_LIB_CORE NAMES mkl_core PATHS $ENV{MKLROOT}/lib $ENV{MKLROOT}/lib/intel64 /usr/local/lib)
FIND_LIBRARY(MKL_LIB_GT NAMES mkl_gnu_thread PATHS $ENV{MKLROOT}/lib $ENV{MKLROOT}/lib/intel64 /usr/local/lib)
FIND_LIBRARY(MKL_LIB_AVX NAMES mkl_avx2 PATHS $ENV{MKLROOT}/lib $ENV{MKLROOT}/lib/intel64 /usr/local/lib)
FIND_LIBRARY(MKL_LIB_AVX512 NAMES mkl_avx512 PATHS $ENV{MKLROOT}/lib $ENV{MKLROOT}/lib/intel64 /usr/local/lib)
SET(MKL_LIBS "-Wl,--no-as-needed" ${MKL_LIB_LP} ${MKL_LIB_CORE} ${MKL_LIB_GT} ${OMP_LIB_NAME} ${MKL_LIB_AVX} ${MKL_LIB_AVX512})
SET(CMAKE_FIND_LIBRARY_SUFFIXES ${CMAKE_FIND_LIBRARY_SUFFIXES_BKP})
MESSAGE(STATUS "MKL_LIBS = ${MKL_LIBS}")
# FIND OPENMP LIBS - must match those used when building block2
FIND_LIBRARY(OMP_LIBS NAMES gomp PATHS /usr/local/lib /usr/lib64)
IF (NOT OMP_LIBS)
EXECUTE_PROCESS(
COMMAND ${CMAKE_CXX_COMPILER} -print-search-dirs
COMMAND grep "^lib" COMMAND awk -F "=" "{print $2}" COMMAND tr ":" ";"
OUTPUT_VARIABLE OMP_LIB_HINT OUTPUT_STRIP_TRAILING_WHITESPACE)
FIND_LIBRARY(OMP_LIBS NAMES gomp PATHS ${OMP_LIB_HINT})
ENDIF()
ADD_EXECUTABLE(main main.cpp)
TARGET_LINK_LIBRARIES(main PRIVATE ${OMP_LIBS} block2::block2 rt ${MKL_LIBS} ${MPI_CXX_LIBRARIES})
TARGET_COMPILE_OPTIONS(main BEFORE PRIVATE -O3 -funroll-loops -Werror -Werror=return-type -fopenmp ${MPI_COMPILE_FLAGS})
cmake
to compile the main.cpp
:mkdir build
cd build
export MKLROOT=<MKLROOT>
cmake .. -Dblock2_DIR=<INSTALL_DIR>/share/cmake/block2
Remember to use the correct <MKLROOT>
and <INSTALL_DIR>
.
O2.ACTIVE.FCIDUMP
for this calculation:from pyscf import gto, scf, mcscf
from pyblock2._pyscf.ao2mo import integrals as itg
from pyblock2.driver.core import DMRGDriver, SymmetryTypes
mol = gto.M(atom='O 0 0 0; O 0 0 1.207', basis='cc-pvdz', spin=2, verbose=4)
mf = scf.RHF(mol).run(conv_tol=1E-20)
mc = mcscf.CASSCF(mf, 6, 8)
mc.fcisolver.conv_tol = 1e-14
mc.canonicalization = True
mc.natorb = True
mc.run()
mf.mo_coeff = mc.mo_coeff
ncas, n_elec, spin, ecore, h1e, g2e, orb_sym = itg.get_rhf_integrals(mf, ncore=mc.ncore, ncas=mc.ncas, g2e_symm=8)
driver = DMRGDriver(scratch="./tmp", symm_type=SymmetryTypes.SU2)
driver.initialize_system(n_sites=ncas, n_elec=n_elec, spin=spin, orb_sym=orb_sym, singlet_embedding=False)
driver.write_fcidump(h1e, g2e, ecore, filename='O2.ACTIVE.FCIDUMP', h1e_symm=True, pg='d2h')
./main
or mpirun -n 4 ./main
. 3PDM, A16, and A22 are stored as text files.Thanks for the in-depth answer! I gave it a try and encountered a few issues:
1) I had to fix some of the paths in the 'block2Config.cmake' file inside Block2
2) Finally, I was able to run the 'main' a few times, but it crashed occasionally in parallel. The values of the matrix elements varied a bit with different nr. of processes as well. The numbers looked a bit odd, but I am no expert. I'll attach my 3PDM file just in case: spatial_threepdm.0.0.txt
To give you a perhaps clearer idea of what I am looking for: We are using the old Block code in our multiconfigurational quantum chemistry program to do DMRGSCF and SC-NEVPT2 calculations. Due to incompatibility with newer compilers and libraries, we would like to upgrade. The main quantities of interest are 1,2,3-PDMs, Aux16, Aux22 and of course the electronic energy.
Could you show me the output of main
when you run it in parallel?
Thanks. The output looks ok but the 3pdm is not correct. The problem is likely that you did not compile the code correctly. To make the compilation easier, let's try the following alternative way:
block2
source code.block2
source code, replace the src/main.cpp
with the main.cpp
that I provided above.block2
with the replaced main.cpp
to get an executable using cmake .. -DUSE_MKL=ON -DBUILD_CLIB=OFF -DMPI=ON -DLARGE_BOND=ON -DUSE_IC=ON
. We do not need make install
and you should get the block2
executable in the build directory. Let me know whether mpirun -n 2 ./block2
works or not.I followed your instructions. I get similar values, but after the 6 digit or so the values differ a bit. Is there some stochastics in use? If I use more than 8 cores, it crashes.
Result with 2 cores: spatial_threepdm.0.0-2cores.txt
My FCIDUMP: O2.ACTIVE.FCIDUMP.txt
I get similar values, but after the 6 digit or so the values differ a bit.
This is okay. We started from a random initial guessed MPS.
Result with 2 cores:
In main.cpp
I wrote const int n_threads = 4
So when you do mpirun -n 1
it already used 4 cores (threads).
If I use more than 8 cores, it crashes.
If you do mpirun -n 8
with const int n_threads = 4
in only one node, you need at least 32 cores in the node. Do you have enough number of cores in the node? When you say it crashes, what exactly do you mean by "crash"?
Thanks, I didn't notice the number of threads, now the performance makes more sense. My CPU has 12 cores and 24 threads so it doesn't make sense to run with many MPI processes.
"Crashed" run with 9 MPI procs: broken.txt
"Crashed" run with 9 MPI procs
You did not say how many threads are used for each MPI proc. If you still use 4 threads, this setting does not make sense on this computer. Also note that I have put const size_t stack_mem = 1ULL << 30; // 1 GB
which is the memory per proc. When you do mpirun -n 9
you need at least 9 GB free memory, otherwise you should decrease this number.
My CPU has 12 cores and 24 threads
Hyperthreading is not useful for high performance computing. So we consider this as only 12 physical threads.
Anyway, I think there is basically no further problems with parallel execution. You can adapt main.cpp
for interfacing with your C++ program. Please let me know whether you have any other questions.
Hi, I am trying to interface your library in our C++ code for performing the DMRG calculations. I would like to be able to perform the SC-NEVPT2 calculations as well and for that need to have the 3RDM, Aux16 and Aux22 matrices. Is there a keyword to obtain these quantities, e.g. like there was the "restart_nevpt2_npdm" in the older Block code? I am looking into 'readthedocs' and the source code but having some difficulty finding it.
Thanks in advance