Open pxlxingliang opened 1 year ago
Testing at 20230907: Using 20230907 develop codes, and test the running on different Bohrium machines and different OMP and MPI parallel strategies. The image is: "registry.dp.tech/deepmodeling/abacus-intel:latest" where use intel compiler. 6 different tests, and each type run repeated 10 times.
Detail results can be found in: https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.23-32a3fd
drho of each runs:
As we can see, only parallel with mpi can keep the stable results, and is not related to the type of machine.
Testing at 9-11: Set fp-model to strict in intel compiling. machine: c32_m128_cpu image: registry.dp.tech/deepmodeling/abacus-intel:latest drho:
The 10 elpa are stable, and scalapack are unstable.
9-13: @jinzx10 modify the setting of orfac and work array in the calculation of scalapack. (@jinzx10 please supply the detail modification) The calculation of scalpack is much more stable. I tested on 10 c32_m128_cpu machines, the drho display two types of results, and same for elpa.
Checked with Bohrium, the 10 jobs of elpa use the same machine type: ecs.u1-c1m4.8xlarge (ali), but we are not sure the cpu are exactly same.
I also run 10 jobs on one machine, and both elpa and scalapack display one type of results. https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.31-0b15d3
Collaborate with Bohrium, we run 10 jobs on 10 cpu "Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz", both elpa and scalapack has one type of results. https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.31-494c1f
Conclusion: When using the modified setting of orfac and work array in the calculation of scalapack, the elpa and scalpack has two types of results, but can be stable on the same CPU (at least on Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz is stable).
9-13 testing on GNU compiler: Both elpa and scalpack are unstable on GNU compiler. Only test MPI parallel (the OMP parallel in GNU is terrible) https://labs.dp.tech/projects/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.30-b93f76
@jinzx10 @LiuXiaohui123321 The testing on 004_Li128C75H100O75 is updated here. If you have other testing results of this example, please also update here.
The changes I made to the scalapack diagonalization interface (pdsygvx & pzhegvx in module_hsolver/diago_blas.cpp) include orfac and a few work-space-related parameters:
diff --git a/source/module_hsolver/diago_blas.cpp b/source/module_hsolver/diago_blas.cpp
index 489be9f09..8c725857c 100644
--- a/source/module_hsolver/diago_blas.cpp
+++ b/source/module_hsolver/diago_blas.cpp
@@ -62,7 +62,7 @@ std::pair<int, std::vector<int>> DiagoBlas::pdsygvx_once(const int *const desc,
const int itype = 1, il = 1, iu = GlobalV::NBANDS, one = 1;
int M = 0, NZ = 0, lwork = -1, liwork = -1, info = 0;
double vl = 0, vu = 0;
- const double abstol = 0, orfac = -1;
+ const double abstol = 0, orfac = 0.01;
std::vector<double> work(3, 0);
std::vector<int> iwork(1, 0);
std::vector<int> ifail(GlobalV::NLOCAL, 0);
@@ -109,6 +109,10 @@ std::pair<int, std::vector<int>> DiagoBlas::pdsygvx_once(const int *const desc,
+ ModuleBase::GlobalFunc::TO_STRING(__LINE__));
// GlobalV::ofs_running<<"lwork="<<work[0]<<"\t"<<"liwork="<<iwork[0]<<std::endl;
+
+ work[0] *= 10;
+ iwork[0] *= 10;
+
lwork = work[0];
work.resize(std::max(lwork,3), 0);
liwork = iwork[0];
@@ -184,7 +188,7 @@ std::pair<int, std::vector<int>> DiagoBlas::pzhegvx_once(const int *const desc,
const char jobz = 'V', range = 'I', uplo = 'U';
const int itype = 1, il = 1, iu = GlobalV::NBANDS, one = 1;
int M = 0, NZ = 0, lwork = -1, lrwork = -1, liwork = -1, info = 0;
- const double abstol = 0, orfac = -1;
+ const double abstol = 0, orfac = 0.01;
//Note: pzhegvx_ has a bug
// We must give vl,vu a value, although we do not use range 'V'
// We must give rwork at least a memory of sizeof(double) * 3
@@ -238,6 +242,12 @@ std::pair<int, std::vector<int>> DiagoBlas::pzhegvx_once(const int *const desc,
+ ModuleBase::GlobalFunc::TO_STRING(__LINE__));
// GlobalV::ofs_running<<"lwork="<<work[0]<<"\t"<<"lrwork="<<rwork[0]<<"\t"<<"liwork="<<iwork[0]<<std::endl;
+
+ work[0] *= 10.0;
+ iwork[0] *= 10;
+ rwork[0] *= 10;
+
+
lwork = work[0].real();
work.resize(lwork, 0);
lrwork = rwork[0] + this->degeneracy_max * GlobalV::NLOCAL;
@@ -402,4 +412,4 @@ void DiagoBlas::post_processing(const int info, const std::vector<int> &vec)
}
}
According to the source file (https://netlib.org/scalapack/explore-html/d7/dff/pzhegvx_8f_source.html), orthogonality of eigenvectors could be an issue if there are many eigenvectors with close eigenvalues. pzhegvx does provide ways to guarantee orthogonality, but it's very tricky and depends on a few parameters.
orfac is the threshold used to determine what eigenvectors are considered close enough that needs reorthogonalization. The default is 1e-3, which I changed to 1e-2 in the above test with modified scalapack. The size of work space array should also increase. In the test I simply increase them by a factor of 10, which should not be optimal and could be improved.
Did Intel compiled ABACUS compiled by icpx
? In my test, icpc
compiled abacus will also be unstable
(But the problem can also be in toolchain compile method)
As a side note, I notice that abacus always solves the eigenvalue equation in the basis of all orbitals. I know that many quantum chemistry softwares using gaussian basis entail an extra canonical orthogonalization to "project out" basis orbitals that are almost linearly dependent. Resulting eigenvalue equations are usually more stable. Some explanation can be found in the qchem's manual (https://manual.q-chem.com/latest/sec_Basis_Customization.html) or Szabo & Ostlund's book (ch 3.4.5). I'm not sure if numerical atomic orbitals should use a similar strategy (and it inevitably complicates the code for MPI parallelization where matrices are stored in a block-cyclic format).
Did Intel compiled ABACUS compiled by
icpx
? In my test,icpc
compiled abacus will also be unstable (But the problem can also be in toolchain compile method)
I think it depends on some enviroment variables like CXX or I_MPI_CXX. Environment set up by current Dockerfile.intel would use icpx.
As a side note, I notice that abacus always solves the eigenvalue equation in the basis of all orbitals. I know that many quantum chemistry softwares using gaussian basis entail an extra canonical orthogonalization to "project out" basis orbitals that are almost linearly dependent. Resulting eigenvalue equations are usually more stable. Some explanation can be found in the qchem's manual (https://manual.q-chem.com/latest/sec_Basis_Customization.html) or Szabo & Ostlund's book (ch 3.4.5). I'm not sure if numerical atomic orbitals should use a similar strategy (and it inevitably complicates the code for MPI parallelization where matrices are stored in a block-cyclic format).
In my recollection, ABACUS LCAO solve eigenvalue equation by directly doing einsum
for generalized eigenvalue equation, (I don't know whether it is true), but it is sure that most of quantum chemistry software make this done by canonical orthogonalization or other orthogonalization method. I'm wondering for a while whether method is better.
This is related to input parameter and device.
Hi all,
Recently I am doing some tests for mixing, and I also notice that the unstable LCAO calculation of 004_Li128C75H100O75
. I do the calculations 10 times for different mixing method. ABACUS test Link
Interestingly, I found the calculations is relatively stable with Broyden mixing, while unstable cases occur with Pulay mixing. To double check, I do an extra calculation (20 times), and find Broyden calculations are also unstable. ABACUS test Link
@pxlxingliang, is this case still unstable now? Can we close this issue?
I used the latest intel/gnu images with ks_solver genelpa and scalapack_gvx to run this example 10 times. The elpa can be stable in intel and gnu images intel: https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.109-e84632 gnu: https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.109-ddb135 [图片] While for ks_solver scalapack_gvx, the drho has a large fluctuation intel: https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.109-2661e3 gnu: https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.109-48dbc5
I have tested the scalapack method on @jinzx10's commit with intel and gnu compiled abacus. Results of 10 runs with intel are stable, while gnu results are unstable. intel: https://app.bohrium.dp.tech/abacustest/?request=GET%3A%2Fapplications%2Fabacustest%2Fjobs%2Fjob-abacustest-v0.3.113-802aa8
Describe the bug
The LCAO calculation of daily test 004_Li128C75H100O75 is unstable. 004_Li128C75H100O75.zip
At current version (20230921 develop branch), the intel compiled calculation is stable, but GNU compiled is unstable.
Details of some testing will be updated at below.
Expected behavior
No response
To Reproduce
No response
Environment
No response
Additional Context
No response
Task list for Issue attackers