Open Qianruipku opened 2 weeks ago
Now DiagoIterAssist<T, Device>::diagH_subspace
is only referenced in CG method to do a first pre-loop generalized eigenproblem:
// cg diag() function
if (need_subspace_ || ntry > 0)
{
ct::TensorMap psi_map = ct::TensorMap(psi.data(), psi_temp);
this->subspace_func_(psi_temp, psi_map);
psi_temp.sync(psi_map);
}
// hsolver_pw wrapper
auto subspace_func = [hm, ngk_pointer](const ct::Tensor& psi_in, ct::Tensor& psi_out) {
...
DiagoIterAssist<T, Device>::diagH_subspace(hm, psi_in_wrapper, psi_out_wrapper, eigen.data<Real>());
};
// diagH_subspace
// evc is eigenvector block
// psi is an initial guess
void DiagoIterAssist<T, Device>::diagH_subspace(psi, evc, ...)
{
...
{ // code block to calculate hcc and scc
// do hPsi for all bands
pHamilt->ops->hPsi(hpsi_in);
gemm_op<T, Device>()('C', 'N', psi.get_pointer(),hphi, hcc);
// do sPsi for all bands
pHamilt->sPsi(psi.get_pointer(), sphi, dmax, dmin, nstart);
gemm_op<T, Device>()('C', 'N', psi, sphi, scc);
}
// after generation of H and S matrix, diag them
DiagoIterAssist::diagH_LAPACK(nstart, n_band, hcc, scc, nstart, en, vcc);
{ // code block to calculate evc
gemm_op<T, Device>()('N', 'N', psi.get_pointer(), vcc, evc);
}
}
diagH_subspace
function consists of 3 parts:
(we use V
to represent psi
block here)
diagH_LAPACK
(wrapper of zhegvd
inside) to solve the dense projected eigenproblem $(V^THV) y =\lambda (V^TSV) y $evc = psi * vcc
to extract a new set of approximations $V_{new} = Vy$ from subspace V
(i.e. psi
input, or initial guess space)If initial guess V=psi
is S-orthogonal, then $V^TSV$ will be identity. Therefore, we do NOT need to store overlap matrix scc
, nor do we need to call generalized zhegvd
, but a standard routine zheev
instead.
We may need to introduce a new parameter to tell whether to assume that psi
is an orthogonal set in this step.
Following parameters are needed:
PARAM.inp.use_paw
PARAM.globalv.use_uspp
I will create a new issue.
Background
When using NCPP, S matrix is a unit matrix, which, however, cost $nproc nbands nbands$-complex memories.
For example, when nbands = 6000 and nproc=96, S will cost about 51.5 GB!
Describe the solution you'd like
S do not cost memories.
Task list only for developers
Notice Possible Changes of Behavior (Reminder only for developers)
No response
Notice any changes of core modules (Reminder only for developers)
No response
Notice Possible Changes of Core Modules (Reminder only for developers)
No response
Additional Context
No response
Task list for Issue attackers (only for developers)