MRChemSoft / mrchem

MultiResolution Chemistry
GNU Lesser General Public License v3.0
27 stars 21 forks source link

Inconsistent results MPI #460

Closed moritzgubler closed 1 year ago

moritzgubler commented 1 year ago

Hi

I have found some MPI behavior of mrchem that I do not understand. As a test, I tried calculating a single oxygen atom (unrestricted with spin multiplicity 3). Depending on the number of MPI processes I use, I get completely different results. With 2 processes the final energy is -75.01503 and with 4 processes the result is -4.500000 and the SCF optimization doesn't work properly.

The Input file I used:

world_prec = 1.0e-4
world_unit = angstrom

WaveFunction {
  method = DFT
  restricted = false
}

DFT {
$functionals
PBEX     1.0
PBEC     1.0
$end
}

Molecule {
multiplicity = 3
$coords
O          0.0      0.0     0.0
$end
}

The SCF cycle with 2 processes looks normal:

 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.828427e+00         -74.568192919943         -7.456819e+01
    1          2.381256e-01         -74.996492622877         -4.282997e-01
    2          5.025847e-02         -75.011883575907         -1.539095e-02
    3          2.522809e-02         -75.014215083012         -2.331507e-03
    4          7.086850e-03         -75.014676126326         -4.610433e-04
    5          5.868909e-03         -75.014884626906         -2.085006e-04
    6          5.860965e-04         -75.014887458715         -2.831809e-06
---------------------------------------------------------------------------
                      SCF converged in 6 iterations!
===========================================================================

With 4 processes the calculation seems to go wrong:

===========================================================================
 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.828427e+00         -54.100721122221         -5.410072e+01
    1          1.874352e+00         -47.148898853966          6.951822e+00
    2          2.481909e+00          -4.500000000000          4.264890e+01
    3          1.732051e+00          -4.500000000000          0.000000e+00
---------------------------------------------------------------------------
                      SCF converged in 3 iterations!
===========================================================================

Is there some kind of setting I am missing? In the .out file of the second there seems to be no warning that the calculation failed.

Steps to reproduce the result:

To start the simulations I used:

export OMP_NUM_THREADS=4
mrchem --dryrun O.inp
mpirun -np 2 mrchem.x O.json > O_2_processes.out
# I deleted all files except O.inp and O_2_processes.out
mrchem --dryrun O.inp
mpirun -np 4 mrchem.x O.json > O_4_processes.out

Do you have an idea what might be the issue here? Thanks a lot for looking into it:)

Best, Moritz

moritzgubler commented 1 year ago

Here are the contents of the O_2_processes.out file:


***************************************************************************
***                                                                     ***
***                                                                     ***
***          __  __ ____   ____ _                                       ***
***         |  \/  |  _ \ / ___| |__   ___ _ __ ___                     ***
***         | |\/| | |_) | |   | '_ \ / _ \ '_ ` _ \                    ***
***         | |  | |  _ <| |___| | | |  __/ | | | | |                   ***
***         |_|  |_|_| \_\\____|_| |_|\___|_| |_| |_|                   ***
***                                                                     ***
***         VERSION            1.2.0-alpha                              ***
***                                                                     ***
***         Git branch         feature/relax                            ***
***         Git commit hash    f5aeb60041ef2a1c528a                     ***
***         Git commit author  moritzgubler                             ***
***         Git commit date    Fri Aug 18 14:44:30 2023 +0200           ***
***                                                                     ***
***         Contact: luca.frediani@uit.no                               ***
***                                                                     ***
***         Radovan Bast            Magnar Bjorgve                      ***
***         Roberto Di Remigio      Antoine Durdek                      ***
***         Luca Frediani           Gabriel Gerez                       ***
***         Stig Rune Jensen        Jonas Juselius                      ***
***         Rune Monstad            Peter Wind                          ***
***                                                                     ***
***************************************************************************

---------------------------------------------------------------------------

 MPI processes           :        (1 bank)                               2
 OpenMP threads          :                                               4
 Total cores             :                                               5

---------------------------------------------------------------------------

XCFun DFT library Copyright 2009-2020 Ulf Ekstrom and contributors.
See http://dftlibs.org/xcfun/ for more information.

This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details see the documentation.
Scientific users of this library should cite
U. Ekstrom, L. Visscher, R. Bast, A. J. Thorvaldsen and K. Ruud;
J.Chem.Theor.Comp. 2010, DOI: 10.1021/ct100117s

---------------------------------------------------------------------------

 MRCPP version         : 1.5.0-alpha
 Git branch            : HEAD
 Git commit hash       : a3618d1498410124ec47
 Git commit author     : gitpeterwind
 Git commit date       : Tue Feb 21 15:21:15 2023 +0100

 Linear algebra        : EIGEN v3.4.0
 Parallelization       : MPI/OpenMP (4 threads)

---------------------------------------------------------------------------

===========================================================================
                         MultiResolution Analysis
---------------------------------------------------------------------------
 polynomial order      : 6
 polynomial type       : Interpolating
---------------------------------------------------------------------------
 total boxes           : 8
 boxes                 : [          2           2           2 ]
 unit lengths          : [   16.00000    16.00000    16.00000 ]
 scaling factor        : [    1.00000     1.00000     1.00000 ]
 lower bounds          : [  -16.00000   -16.00000   -16.00000 ]
 upper bounds          : [   16.00000    16.00000    16.00000 ]
 total length          : [   32.00000    32.00000    32.00000 ]
===========================================================================

***************************************************************************
***                                                                     ***
***                        Initializing Molecule                        ***
***                                                                     ***
***************************************************************************

===========================================================================
                                 Molecule
---------------------------------------------------------------------------
 Charge                  :                                               0
 Multiplicity            :                                               3
---------------------------------------------------------------------------
    N    Atom            :               x               y               z
---------------------------------------------------------------------------
    0       O            :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Center of mass          :        0.000000        0.000000        0.000000
===========================================================================

***************************************************************************
***                                                                     ***
***                Computing Initial Guess Wavefunction                 ***
***                                                                     ***
***************************************************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Compute initial orbitals
 Method                  : Diagonalize SAD Hamiltonian
 Precision               : 1.00000e-03
 Screening               : 1.20000e+01 StdDev
 Restricted              : False
 Functional              : LDA (SVWN5)
 AO basis                : 3-21G
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
                            Molecular Orbitals
---------------------------------------------------------------------------
 Alpha electrons         :                                               5
 Beta electrons          :                                               3
 Total electrons         :                                               8
---------------------------------------------------------------------------
    n  Occ Spin          :                                            Norm
---------------------------------------------------------------------------
    0    1    a          :                              9.999999960158e-01
    1    1    a          :                              9.999999997235e-01
    2    1    a          :                              9.999999996591e-01
    3    1    a          :                              9.999999996591e-01
    4    1    a          :                              9.999999996659e-01
    5    1    b          :                              9.999999960158e-01
    6    1    b          :                              9.999999997235e-01
    7    1    b          :                              9.999999996591e-01
===========================================================================

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Compute initial energy
 Method                  : DFT
 Relativity              : None
 Environment             : None
 External fields         : None
 Precision               : 1.00000e-03
 Localization            : Off
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
                        Molecular Energy (initial)
---------------------------------------------------------------------------
 Kinetic energy          :            (au)                 73.848778155587
 E-N energy              :            (au)               -176.645876573153
 Coulomb energy          :            (au)                 36.592635366777
 Exchange energy         :            (au)                  0.000000000000
 X-C energy              :            (au)                 -8.363729869154
 N-N energy              :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Electronic energy       :            (au)                -74.568192919943
 Nuclear energy          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Total energy            :            (au)             -7.456819291994e+01
                         :      (kcal/mol)             -4.679224752103e+04
                         :        (kJ/mol)             -1.957787636280e+05
                         :            (eV)             -2.029103899210e+03
===========================================================================

===========================================================================
                        Orbital Energies (initial)
---------------------------------------------------------------------------
    n  Occ Spin          :                                         Epsilon
---------------------------------------------------------------------------
    0    1    a          :            (au)                -18.783069858705
    1    1    a          :            (au)                 -0.879098549092
    2    1    a          :            (au)                 -0.353620289082
    3    1    a          :            (au)                 -0.353199550583
    4    1    a          :            (au)                 -0.272330898872
    5    1    b          :            (au)                -18.742457988742
    6    1    b          :            (au)                 -0.752205509374
    7    1    b          :            (au)                 -0.223907511331
---------------------------------------------------------------------------
 Sum occupied            :            (au)                -40.359890155780
===========================================================================

***************************************************************************
***                                                                     ***
***                 Computing Ground State Wavefunction                 ***
***                                                                     ***
***************************************************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Optimize ground state orbitals
 Method                  : DFT
 Relativity              : None
 Environment             : None
 External fields         : None
 Checkpointing           : Off
 Max iterations          : 100
 KAIN solver             : 5
 Localization            : Off
 Diagonalization         : First two iterations
 Start precision         : 1.00000e-04
 Final precision         : 1.00000e-04
 Helmholtz precision     : Dynamic
 Energy threshold        : Off
 Orbital threshold       : 1.00000e-03
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.828427e+00         -74.568192919943         -7.456819e+01
    1          2.381256e-01         -74.996492622877         -4.282997e-01
    2          5.025847e-02         -75.011883575907         -1.539095e-02
    3          2.522809e-02         -75.014215083012         -2.331507e-03
    4          7.086850e-03         -75.014676126326         -4.610433e-04
    5          5.868909e-03         -75.014884626906         -2.085006e-04
    6          5.860965e-04         -75.014887458715         -2.831809e-06
---------------------------------------------------------------------------
                      SCF converged in 6 iterations!
===========================================================================

***************************************************************************
***                                                                     ***
***                    Printing Molecular Properties                    ***
***                                                                     ***
***************************************************************************

===========================================================================
                                 Molecule
---------------------------------------------------------------------------
 Charge                  :                                               0
 Multiplicity            :                                               3
---------------------------------------------------------------------------
    N    Atom            :               x               y               z
---------------------------------------------------------------------------
    0       O            :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Center of mass          :        0.000000        0.000000        0.000000
===========================================================================

===========================================================================
                         Molecular Energy (final)
---------------------------------------------------------------------------
 Kinetic energy          :            (au)                 74.816517627631
 E-N energy              :            (au)               -177.985373989143
 Coulomb energy          :            (au)                 36.530963832145
 Exchange energy         :            (au)                  0.000000000000
 X-C energy              :            (au)                 -8.376994929348
 N-N energy              :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Electronic energy       :            (au)                -75.014887458715
 Nuclear energy          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Total energy            :            (au)             -7.501488745872e+01
                         :      (kcal/mol)             -4.707255257612e+04
                         :        (kJ/mol)             -1.969515599785e+05
                         :            (eV)             -2.041259076838e+03
===========================================================================

===========================================================================
                         Orbital Energies (final)
---------------------------------------------------------------------------
    n  Occ Spin          :                                         Epsilon
---------------------------------------------------------------------------
    0    1    a          :            (au)                -18.899020429265
    1    1    a          :            (au)                 -0.924060598595
    2    1    a          :            (au)                 -0.400166736087
    3    1    a          :            (au)                 -0.400166092761
    4    1    a          :            (au)                 -0.323656251940
    5    1    b          :            (au)                -18.850946920595
    6    1    b          :            (au)                 -0.791823996365
    7    1    b          :            (au)                 -0.279390669385
---------------------------------------------------------------------------
 Sum occupied            :            (au)                -40.869231694993
===========================================================================

===========================================================================
                           Dipole Moment (dip-1)
---------------------------------------------------------------------------
 r_O                     :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Electronic vector       :        0.000000        0.000000        0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
---------------------------------------------------------------------------
 Nuclear vector          :       -0.000000       -0.000000       -0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
---------------------------------------------------------------------------
 Total vector            :       -0.000000       -0.000000       -0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
===========================================================================

***************************************************************************
***                                                                     ***
***                            Exiting MRChem                           ***
***                                                                     ***
***                       Wall time :  0h  3m 10s                       ***
***                                                                     ***
***************************************************************************
moritzgubler commented 1 year ago

And here for the O_4_processes.out file:


***************************************************************************
***                                                                     ***
***                                                                     ***
***          __  __ ____   ____ _                                       ***
***         |  \/  |  _ \ / ___| |__   ___ _ __ ___                     ***
***         | |\/| | |_) | |   | '_ \ / _ \ '_ ` _ \                    ***
***         | |  | |  _ <| |___| | | |  __/ | | | | |                   ***
***         |_|  |_|_| \_\\____|_| |_|\___|_| |_| |_|                   ***
***                                                                     ***
***         VERSION            1.2.0-alpha                              ***
***                                                                     ***
***         Git branch         feature/relax                            ***
***         Git commit hash    f5aeb60041ef2a1c528a                     ***
***         Git commit author  moritzgubler                             ***
***         Git commit date    Fri Aug 18 14:44:30 2023 +0200           ***
***                                                                     ***
***         Contact: luca.frediani@uit.no                               ***
***                                                                     ***
***         Radovan Bast            Magnar Bjorgve                      ***
***         Roberto Di Remigio      Antoine Durdek                      ***
***         Luca Frediani           Gabriel Gerez                       ***
***         Stig Rune Jensen        Jonas Juselius                      ***
***         Rune Monstad            Peter Wind                          ***
***                                                                     ***
***************************************************************************

---------------------------------------------------------------------------

 MPI processes           :        (1 bank)                               4
 OpenMP threads          :                                               4
 Total cores             :                                              13

---------------------------------------------------------------------------

XCFun DFT library Copyright 2009-2020 Ulf Ekstrom and contributors.
See http://dftlibs.org/xcfun/ for more information.

This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details see the documentation.
Scientific users of this library should cite
U. Ekstrom, L. Visscher, R. Bast, A. J. Thorvaldsen and K. Ruud;
J.Chem.Theor.Comp. 2010, DOI: 10.1021/ct100117s

---------------------------------------------------------------------------

 MRCPP version         : 1.5.0-alpha
 Git branch            : HEAD
 Git commit hash       : a3618d1498410124ec47
 Git commit author     : gitpeterwind
 Git commit date       : Tue Feb 21 15:21:15 2023 +0100

 Linear algebra        : EIGEN v3.4.0
 Parallelization       : MPI/OpenMP (4 threads)

---------------------------------------------------------------------------

===========================================================================
                         MultiResolution Analysis
---------------------------------------------------------------------------
 polynomial order      : 6
 polynomial type       : Interpolating
---------------------------------------------------------------------------
 total boxes           : 8
 boxes                 : [          2           2           2 ]
 unit lengths          : [   16.00000    16.00000    16.00000 ]
 scaling factor        : [    1.00000     1.00000     1.00000 ]
 lower bounds          : [  -16.00000   -16.00000   -16.00000 ]
 upper bounds          : [   16.00000    16.00000    16.00000 ]
 total length          : [   32.00000    32.00000    32.00000 ]
===========================================================================

***************************************************************************
***                                                                     ***
***                        Initializing Molecule                        ***
***                                                                     ***
***************************************************************************

===========================================================================
                                 Molecule
---------------------------------------------------------------------------
 Charge                  :                                               0
 Multiplicity            :                                               3
---------------------------------------------------------------------------
    N    Atom            :               x               y               z
---------------------------------------------------------------------------
    0       O            :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Center of mass          :        0.000000        0.000000        0.000000
===========================================================================

***************************************************************************
***                                                                     ***
***                Computing Initial Guess Wavefunction                 ***
***                                                                     ***
***************************************************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Compute initial orbitals
 Method                  : Diagonalize SAD Hamiltonian
 Precision               : 1.00000e-03
 Screening               : 1.20000e+01 StdDev
 Restricted              : False
 Functional              : LDA (SVWN5)
 AO basis                : 3-21G
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
                            Molecular Orbitals
---------------------------------------------------------------------------
 Alpha electrons         :                                               5
 Beta electrons          :                                               3
 Total electrons         :                                               8
---------------------------------------------------------------------------
    n  Occ Spin          :                                            Norm
---------------------------------------------------------------------------
    0    1    a          :                              9.999999960158e-01
    1    1    a          :                              9.999999997235e-01
    2    1    a          :                              9.999999996591e-01
    3    1    a          :                              9.999999996591e-01
    4    1    a          :                              9.999999996659e-01
    5    1    b          :                              9.999999960158e-01
    6    1    b          :                              9.999999997235e-01
    7    1    b          :                              9.999999996591e-01
===========================================================================

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Compute initial energy
 Method                  : DFT
 Relativity              : None
 Environment             : None
 External fields         : None
 Precision               : 1.00000e-03
 Localization            : Off
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
                        Molecular Energy (initial)
---------------------------------------------------------------------------
 Kinetic energy          :            (au)                 34.936106724967
 E-N energy              :            (au)                -97.184269505373
 Coulomb energy          :            (au)                 12.924607442006
 Exchange energy         :            (au)                  0.000000000000
 X-C energy              :            (au)                 -4.777165783821
 N-N energy              :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Electronic energy       :            (au)                -54.100721122221
 Nuclear energy          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Total energy            :            (au)             -5.410072112222e+01
                         :      (kcal/mol)             -3.394871505784e+04
                         :        (kJ/mol)             -1.420414238020e+05
                         :            (eV)             -1.472155618643e+03
===========================================================================

===========================================================================
                        Orbital Energies (initial)
---------------------------------------------------------------------------
    n  Occ Spin          :                                         Epsilon
---------------------------------------------------------------------------
    0    1    a          :            (au)                -25.701842661391
    1    1    a          :            (au)                 -3.521574580411
    2    1    a          :            (au)                 -2.947287129136
    3    1    a          :            (au)                 -2.947287101755
    4    1    a          :            (au)                 -2.936482027898
    5    1    b          :            (au)                  0.000000000000
    6    1    b          :            (au)                  0.000000000000
    7    1    b          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Sum occupied            :            (au)                -38.054473500590
===========================================================================

***************************************************************************
***                                                                     ***
***                 Computing Ground State Wavefunction                 ***
***                                                                     ***
***************************************************************************

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Calculation             : Optimize ground state orbitals
 Method                  : DFT
 Relativity              : None
 Environment             : None
 External fields         : None
 Checkpointing           : Off
 Max iterations          : 100
 KAIN solver             : 5
 Localization            : Off
 Diagonalization         : First two iterations
 Start precision         : 1.00000e-04
 Final precision         : 1.00000e-04
 Helmholtz precision     : Dynamic
 Energy threshold        : Off
 Orbital threshold       : 1.00000e-03
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

===========================================================================
 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.828427e+00         -54.100721122221         -5.410072e+01
    1          1.874352e+00         -47.148898853966          6.951822e+00
    2          2.481909e+00          -4.500000000000          4.264890e+01
    3          1.732051e+00          -4.500000000000          0.000000e+00
---------------------------------------------------------------------------
                      SCF converged in 3 iterations!
===========================================================================

***************************************************************************
***                                                                     ***
***                    Printing Molecular Properties                    ***
***                                                                     ***
***************************************************************************

===========================================================================
                                 Molecule
---------------------------------------------------------------------------
 Charge                  :                                               0
 Multiplicity            :                                               3
---------------------------------------------------------------------------
    N    Atom            :               x               y               z
---------------------------------------------------------------------------
    0       O            :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Center of mass          :        0.000000        0.000000        0.000000
===========================================================================

===========================================================================
                         Molecular Energy (final)
---------------------------------------------------------------------------
 Kinetic energy          :            (au)                 -4.500000000000
 E-N energy              :            (au)                  0.000000000000
 Coulomb energy          :            (au)                  0.000000000000
 Exchange energy         :            (au)                  0.000000000000
 X-C energy              :            (au)                 -0.000000000000
 N-N energy              :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Electronic energy       :            (au)                 -4.500000000000
 Nuclear energy          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Total energy            :            (au)             -4.500000000000e+00
                         :      (kcal/mol)             -2.823792633284e+03
                         :        (kJ/mol)             -1.181474837766e+04
                         :            (eV)             -1.224512381069e+02
===========================================================================

===========================================================================
                         Orbital Energies (final)
---------------------------------------------------------------------------
    n  Occ Spin          :                                         Epsilon
---------------------------------------------------------------------------
    0    1    a          :            (au)                  0.000000000000
    1    1    a          :            (au)                  0.000000000000
    2    1    a          :            (au)                  0.000000000000
    3    1    a          :            (au)                  0.000000000000
    4    1    a          :            (au)                  0.000000000000
    5    1    b          :            (au)                  0.000000000000
    6    1    b          :            (au)                  0.000000000000
    7    1    b          :            (au)                  0.000000000000
---------------------------------------------------------------------------
 Sum occupied            :            (au)                  0.000000000000
===========================================================================

===========================================================================
                           Dipole Moment (dip-1)
---------------------------------------------------------------------------
 r_O                     :        0.000000        0.000000        0.000000
---------------------------------------------------------------------------
 Electronic vector       :        0.000000        0.000000        0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
---------------------------------------------------------------------------
 Nuclear vector          :       -0.000000       -0.000000       -0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
---------------------------------------------------------------------------
 Total vector            :       -0.000000       -0.000000       -0.000000
 Magnitude               :            (au)                        0.000000
                         :         (Debye)                        0.000000
===========================================================================

***************************************************************************
***                                                                     ***
***                            Exiting MRChem                           ***
***                                                                     ***
***                       Wall time :  0h  0m 25s                       ***
***                                                                     ***
***************************************************************************
stigrj commented 1 year ago

Hmm, this looks like some kind of corner case that has not been considered concerning number of MPI procs vs number of orbitals. I will try to reproduce the error

stigrj commented 1 year ago

@gitpeterwind do you have time to look into this?

Something bad happens in this Phi.distribute() when running this oxygen example with mpirun --np 4. If I compute the overlap matrix of Phi before and after the distribute() on line 423 I get the following:

Before:
 1.000000e+00  4.857226e-16 -1.526557e-16  8.049117e-16 -4.054916e-17  0.000000e+00  0.000000e+00  0.000000e+00
 4.857226e-16  1.000000e+00  5.967449e-16  1.040834e-16  2.775558e-17  0.000000e+00  0.000000e+00  0.000000e+00
-1.526557e-16  5.967449e-16  1.000000e+00  3.261280e-16 -2.775558e-16  0.000000e+00  0.000000e+00  0.000000e+00
 8.049117e-16  1.040834e-16  3.261280e-16  1.000000e+00 -1.734723e-16  0.000000e+00  0.000000e+00  0.000000e+00
-4.054916e-17  2.775558e-17 -2.775558e-16 -1.734723e-16  1.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  1.000000e+00  4.857226e-16 -1.526557e-16
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  4.857226e-16  1.000000e+00  5.967449e-16
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00 -1.526557e-16  5.967449e-16  1.000000e+00
After
 1.000000e+00  3.330669e-16 -1.249001e-16  8.049117e-16 -6.169110e-17  0.000000e+00  0.000000e+00  0.000000e+00
 3.330669e-16  1.000000e+00  5.273559e-16  1.040834e-16  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
-1.249001e-16  5.273559e-16  1.000000e+00  3.261280e-16 -4.440892e-16  0.000000e+00  0.000000e+00  0.000000e+00
 8.083811e-16 -2.081668e-17  1.734723e-16  1.000000e+00 -4.857226e-17  0.000000e+00  0.000000e+00  0.000000e+00
-6.169110e-17  0.000000e+00 -4.440892e-16 -1.734723e-16  1.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00
 0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00  0.000000e+00

I don't understand how this distribution is supposed to work in the new MPI strategy. I don't think the Phi.distribute() in driver::scf::guess_energy should even be necessary, because we are already distributing the orbitals in line 368 in driver::scf::guess_orbitals, but removing the second distribute() leads to completely different (still wrong) results...

moritzgubler commented 1 year ago

If the above example is run for a carbon atom instead of an oxygen atom, the simulation fails as well but the error is much more hidden. The energy with two processes is: -37.798723708995 and with four processes: -25.758687173364

The SCF cycle also looks fine for both cases. SCF with 2 processes:

 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.449490e+00         -37.576145015626         -3.757615e+01
    1          1.432558e-01         -37.791460311162         -2.153153e-01
    2          2.437659e-02         -37.797561425575         -6.101114e-03
    3          9.041404e-03         -37.798446662330         -8.852368e-04
    4          7.228034e-03         -37.798711153805         -2.644915e-04
    5          2.529357e-03         -37.798723004339         -1.185053e-05
    6          3.717123e-04         -37.798723708995         -7.046557e-07

SCF with 4 processes.

 Iter           MO residual             Total energy                Update
---------------------------------------------------------------------------
    0          2.449490e+00         -24.895561058080         -2.489556e+01
    1          3.043381e-01         -25.484156712186         -5.885957e-01
    2          1.766749e-01         -25.665428365470         -1.812717e-01
    3          9.820201e-02         -25.727467403210         -6.203904e-02
    4          1.143087e-01         -25.757526785509         -3.005938e-02
    5          1.726939e-02         -25.758370369613         -8.435841e-04
    6          1.213442e-02         -25.758680940065         -3.105705e-04
    7          1.397656e-03         -25.758687173364         -6.233299e-06

This is also the case for a single nitrogen atom and spin multiplicity 4.

moritzgubler commented 1 year ago

What I also do not understand is that the MO residual is above the orbital_thrs in all the cases where the simulation fails yet it is reported, that the SCF cycle converged.

stigrj commented 1 year ago

Resolved with #461

Thanks for reporting @moritzgubler, this was a quite serious bug :slightly_smiling_face:

moritzgubler commented 1 year ago

Welcome, thanks for fixing it so quickly. @stigrj There is actually a second thing that I think should not happen in the oxygen example: mrchem reports SCF convergence even though the SCF cycle obviously diverged. The MO residual is way above the convergence threshold when mrchem stops the SCF cycle and mentions it is converged.

What do you think of it?

stigrj commented 1 year ago

The convergence criterion for the MO residuals is that each individual orbital should be below the threshold, but what is reported in the output is the norm of the full residual vector. This means that convergence can be reached even if the printed value is above the threshold. In this particular case the beta orbitals were completely corrupted with a norm set to the default value of -1, which is below the convergence threshold but results in a vector norm of sqrt[(-1)^2 + (-1)^2 + (-1)^2]=1.732. Ideally, this should have been captured and reported as failure, but there shouldn't be any problem going forward.

moritzgubler commented 1 year ago

Ok, perfect :)