deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
163 stars 128 forks source link

Question: Davidson slowed down by maintaining unneeded scc array #4856

Open Cstandardlib opened 1 month ago

Cstandardlib commented 1 month ago

Details

It seems that scc array is allocated, maintained but not accessed by actual diag_zhegvx procedure of reduced basis set.

In diago_david.cpp, scc is updated and referenced here:

this->cal_elem(dim, nbase, nbase_x, this->notconv, this->hpsi, this->spsi, this->hcc, this->scc);

this->diag_zhegvx(nbase, nband, this->hcc, this->scc, nbase_x, this->eigenvalue, this->vcc);

This is the actual diagonalization process:

template <typename T, typename Device>
void DiagoDavid<T, Device>::diag_zhegvx(const int& nbase,
                                             const int& nband,
                                             const T* hcc,
                                             const T* /*scc*/,
                                             const int& nbase_x,
                                             Real* eigenvalue, // in CPU
                                             T* vcc)
{
    ModuleBase::timer::tick("DiagoDavid", "diag_zhegvx");
    if (diag_comm.rank == 0)
    {
        assert(nbase_x >= std::max(1, nbase));

        if (this->device == base_device::GpuDevice)
        {
#if defined(__CUDA) || defined(__ROCM)
            Real* eigenvalue_gpu = nullptr;
            resmem_var_op()(this->ctx, eigenvalue_gpu, nbase_x);
            syncmem_var_h2d_op()(this->ctx, this->cpu_ctx, eigenvalue_gpu, this->eigenvalue, nbase_x);

            dnevx_op<T, Device>()(this->ctx, nbase, nbase_x, this->hcc, nband, eigenvalue_gpu, this->vcc);

            syncmem_var_d2h_op()(this->cpu_ctx, this->ctx, this->eigenvalue, eigenvalue_gpu, nbase_x);
            delmem_var_op()(this->ctx, eigenvalue_gpu);
#endif
        }
        else
        {
            dnevx_op<T, Device>()(this->ctx, nbase, nbase_x, this->hcc, nband, this->eigenvalue, this->vcc);
        }
    }

#ifdef __MPI
    if (diag_comm.nproc > 1)
    {
        // vcc: nbase * nband
        for (int i = 0; i < nband; i++)
        {
            MPI_Bcast(&vcc[i * nbase_x], nbase, MPI_DOUBLE_COMPLEX, 0, diag_comm.comm);
        }
        MPI_Bcast(this->eigenvalue, nband, MPI_DOUBLE, 0, diag_comm.comm);
    }
#endif

    ModuleBase::timer::tick("DiagoDavid", "diag_zhegvx");
    return;
}

where the dnevx_op is a wrapper for heevx that only solves standard eigenproblem of Hermitian matrix, and we see only hcc is passed here.

dnevx_op<T, Device>()(this->ctx, nbase, nbase_x, this->hcc, nband, this->eigenvalue, this->vcc);

Note that the time complexity of calculating the scc variable and that of the orthogonalization of the vector set are approximately the same. If this is not intended, it will significantly Slow Down the Davidson algorithm. If scc is maintained, no ortho is needed and hegvx should be called to solve the reduced generalized eigenproblem. This is what the new dav_subspace method implemented.

Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html

Task list for Issue attackers (only for developers)

Cstandardlib commented 1 month ago

I add one line in cal_elem, which is responsible to update scc each iter:

setmem_complex_op()(this->ctx, this->scc, 0, nbase_x * nbase_x);

This line set scc to 0. All tests on david have passed.

Cstandardlib commented 1 month ago

Tests on some examples show an overall acceleration ratio of about 1.1 to 1.2 of HSolverPW. cal_elem has been sped up by a factor of about 2, as follows: diag_once

cal_elem

speedup-ratio

Cstandardlib commented 1 month ago

Now that HSolver module is undergoing a massive refactoring, and there is a lack of systematic testing for generalized eigenvalue problems on iterative diagonalization methods, this issue will be suspended until the above issues are resolved and the module is standardized.