deepmodeling / abacus-develop

An electronic structure package based on either plane wave basis or numerical atomic orbitals.
http://abacus.ustc.edu.cn
GNU Lesser General Public License v3.0
174 stars 136 forks source link

Double peak memory cost in `cast_memory_op` #4153

Closed dyzheng closed 6 months ago

dyzheng commented 6 months ago

Describe the bug

In esolver_ks_pw.cpp:

    this->kspw_psi = GlobalV::device_flag == "gpu" 
                         || GlobalV::precision_flag == "single"
                         ? new psi::Psi<T, Device>(this->psi[0])
                         : reinterpret_cast<psi::Psi<T, Device>*>(this->psi);

the constructor of Psi used the function of cast_memory_op:

template <typename T_out, typename T_in>
struct cast_memory<T_out, T_in, container::DEVICE_CPU, container::DEVICE_GPU> {
    void operator()(
        T_out* arr_out,
        const T_in* arr_in,
        const size_t& size)
    {
        auto * arr = (T_in*) malloc(sizeof(T_in) * size);
        cudaErrcheck(cudaMemcpy(arr, arr_in, sizeof(T_in) * size, cudaMemcpyDeviceToHost));
        for (int ii = 0; ii < size; ii++) {
            arr_out[ii] = static_cast<T_out>(arr[ii]);
        }
        free(arr);
    }
};

the temporary memory of arr is same as Psi, which should be optimized as soon as possible.

Expected behavior

No response

To Reproduce

No response

Environment

No response

Additional Context

No response

Task list for Issue attackers (only for developers)

caic99 commented 6 months ago

Hi @dyzheng , I'm interested in this problem and wonder does the type conversion really happens? FYI, you can select code lines and paste the permalink to show codes in input box. This way provides easier access to reference source codes.

https://github.com/deepmodeling/abacus-develop/blob/b7e91aa9e7a4a449d217257bfeba051177f08949/source/module_esolver/esolver_ks_pw.cpp#L193-L196