Open maki49 opened 1 month ago
The issue of allocating a large amount of irrelevant GPU memory under the LCAO basis set due to setting the device
to GPU
can be referenced in issue #4442.
The kernel op currently used by hsolver
is undergoing a round of refactoring.
It also faces the same issues of supporting heterogeneous computing across multiple devices and the actual devices in use.
The interface is a crucial matter, and we should standardize this in all our functionalities that utilize multiple devices such as FFT and kernel operators.
I suggest we have a further discussion to align on a unified standard to ensure scalability and facilitate portability across various devices, including supercomputers.
@Critsium-xy
@mohanchen provided an idea that directly implement heterogeneous computing in blas_connector. blas_connector.h now includes linking cblas kernels and encapsuling blas kernels. He want to sparate the two part. For example we can leave the declaration of BlasConnector::gemm in blas_connector.h, and implement this function in blas_connector.cpp. He also added that to make BlasConnector::gemm able to support different platforms, we can add a parameter in its parameters as a device flag, which default value represents CPU, and using this parameter to decide which kernel exactly to use (Cublas kernel? Cblas kernel? or hipblas kernel?). This may have performance cost but I may not so huge (I havent tested it yet). After finishing this you can directly throw away other blas encapsultations such as ops in module_hsolver or in @denghuilu 's tensor. But in fact I dont know whether it is exactly a good idea.
Currently,
device
is an input parameter as global variable. Whendevice=gpu
, all the calculation will be done on GPUs. However, "the machine has GPUs" does not need "every module calculates on GPUs". For example, @dzzz2001 find that putting FFT module on GPU costs a lot of memory while having little accelerate effect when using LCAO basis.A possible solution is to have a global list telling the device of each module with the class name as key-like template parameter, here's a demo: