atomic-solvers / featom

Finite Element Solvers for Atomic Structure Calculations
https://atomic-solvers.github.io/featom/
MIT License
12 stars 2 forks source link

Benchmark results #16

Open certik opened 1 year ago

certik commented 1 year ago

As of e7747e665baa92a90115c2719b91a96de92fac5f on Apple M1 Max and GFortran 11.3.0:

$ fpm test --profile=release --flag "-ffast-math -march=native" test_dft_schroed_fast --verbose
[...]
+ build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast
[...]
$ fpm test --profile=release --flag "-ffast-math -march=native" test_dft_dirac_fast --verbose
[...]
+ build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast 
[...]

And then benchmark using:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast
 SCF convergence error:  0.28258872238814092     
 SCF convergence error:   8.2495113254026364E-003
 SCF convergence error:   5.2116623764959513E-003
 SCF convergence error:   2.1089267579554871E-003
 SCF convergence error:   1.6365563510589709E-005
 SCF convergence error:   8.5098749877943192E-006
 SCF convergence error:   4.4540042836160865E-006
 SCF convergence error:   1.6731351859533561E-008
 SCF convergence error:   7.0714598621179903E-009
 SCF convergence error:   4.5291272954273154E-009
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -25658.41788786 -25658.41788885  9.92E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -3689.35513954  -3689.35513984  2.99E-07
   2   -639.77872802   -639.77872809  7.10E-08
   3   -619.10855022   -619.10855018  3.79E-08
   4   -161.11807323   -161.11807321  1.58E-08
   5   -150.97898021   -150.97898016  4.68E-08
   6   -131.97735833   -131.97735828  4.30E-08
   7    -40.52808426    -40.52808425  1.17E-08
   8    -35.85332086    -35.85332083  2.56E-08
   9    -27.12321233    -27.12321230  3.27E-08
  10    -15.02746011    -15.02746007  4.23E-08
  11     -8.82408941     -8.82408940  1.32E-08
  12     -7.01809223     -7.01809220  2.35E-08
  13     -3.86617516     -3.86617513  3.00E-08
  14     -0.36654337     -0.36654335  1.55E-08
  15     -1.32597631     -1.32597632  1.11E-08
  16     -0.82253797     -0.82253797  2.62E-09
  17     -0.14319019     -0.14319018  4.86E-09
  18     -0.13094786     -0.13094786  5.40E-10
build/gfortran_565E65E7876A06C6/test/test_dft_schroed_fast  0.03s user 0.00s system 89% cpu 0.037 total
$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220429767083260     
 SCF iteration:           8
 SCF convergence error:   5.9353570468374528E-003
 SCF iteration:           9
 SCF convergence error:   1.8809300891007297E-003
 SCF iteration:          10
 SCF convergence error:   1.0439264224260114E-004
 SCF iteration:          11
 SCF convergence error:   3.1819967261981219E-005
 SCF iteration:          12
 SCF convergence error:   1.1597509001148865E-005
 SCF iteration:          13
 SCF convergence error:   1.4384913811227307E-006
 SCF iteration:          14
 SCF convergence error:   1.2588679965119809E-006
 SCF iteration:          15
 SCF convergence error:   1.3697535905521363E-007
 SCF iteration:          16
 SCF convergence error:   1.8208083929494023E-008
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232560 -28001.13232549  1.17E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902054  -4223.41902046  8.12E-08
   2   -789.48978235   -789.48978233  2.00E-08
   3   -761.37447600   -761.37447597  2.56E-08
   4   -622.84809459   -622.84809456  2.05E-08
   5   -199.42980565   -199.42980564  5.06E-09
   6   -186.66371313   -186.66371312  7.93E-09
   7   -154.70102668   -154.70102667  4.47E-09
   8   -134.54118030   -134.54118029  8.25E-09
   9   -128.01665739   -128.01665738  7.53E-09
  10    -50.78894806    -50.78894806  4.06E-09
  11    -45.03717128    -45.03717129  3.42E-09
  12    -36.68861048    -36.68861049  4.16E-09
  13    -27.52930624    -27.52930624  3.80E-09
  14    -25.98542890    -25.98542891  3.84E-09
  15    -13.88951423    -13.88951423  4.44E-09
  16    -13.48546969    -13.48546969  4.49E-09
  17    -11.29558710    -11.29558710  1.76E-09
  18     -9.05796425     -9.05796425  1.16E-09
  19     -7.06929563     -7.06929563  4.20E-12
  20     -3.79741623     -3.79741623  1.40E-09
  21     -3.50121719     -3.50121718  1.86E-09
  22     -0.14678839     -0.14678838  5.78E-09
  23     -0.11604717     -0.11604717  5.88E-09
  24     -1.74803996     -1.74803995  7.41E-09
  25     -1.10111901     -1.10111900  7.85E-09
  26     -0.77578419     -0.77578418  7.87E-09
  27     -0.10304082     -0.10304082  5.31E-09
  28     -0.08480203     -0.08480202  4.84E-09
  29     -0.16094729     -0.16094728  3.27E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.78s user 0.04s system 100% cpu 0.806 total

Now apply the following patch:

$ git diff
diff --git a/src/dirac.f90 b/src/dirac.f90
index 0fc99c0..957aeb7 100644
--- a/src/dirac.f90
+++ b/src/dirac.f90
@@ -234,7 +234,7 @@ contains
     real(dp) :: E_dirac_shift
     integer :: idx
     logical :: accurate_eigensolver
-    accurate_eigensolver = .true.
+    accurate_eigensolver = .false.
     iter = iter + 1
     print *, "SCF iteration:", iter
     Vin = reshape(x, shape(Vin))

And

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220430676850810     
 SCF iteration:           8
 SCF convergence error:   5.9354385557526257E-003
 SCF iteration:           9
 SCF convergence error:   1.8807012675097212E-003
 SCF iteration:          10
 SCF convergence error:   1.0478643525857478E-004
 SCF iteration:          11
 SCF convergence error:   3.1437355573871173E-005
 SCF iteration:          12
 SCF convergence error:   1.1697793524945155E-005
 SCF iteration:          13
 SCF convergence error:   1.4564047887688503E-006
 SCF iteration:          14
 SCF convergence error:   1.1532956705195829E-006
 SCF iteration:          15
 SCF convergence error:   4.6789864427410066E-007
 SCF iteration:          16
 SCF convergence error:   4.9017762648873031E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232497 -28001.13232549  5.13E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902063  -4223.41902046  1.75E-07
   2   -789.48978203   -789.48978233  2.98E-07
   3   -761.37447602   -761.37447597  4.29E-08
   4   -622.84809451   -622.84809456  5.05E-08
   5   -199.42980565   -199.42980564  6.11E-09
   6   -186.66371313   -186.66371312  3.46E-09
   7   -154.70102667   -154.70102667  2.87E-09
   8   -134.54118030   -134.54118029  5.74E-09
   9   -128.01665739   -128.01665738  5.19E-09
  10    -50.78894806    -50.78894806  2.26E-10
  11    -45.03717128    -45.03717129  5.89E-09
  12    -36.68861048    -36.68861049  4.33E-09
  13    -27.52930624    -27.52930624  4.18E-09
  14    -25.98542890    -25.98542891  4.04E-09
  15    -13.88951423    -13.88951423  4.37E-09
  16    -13.48546969    -13.48546969  3.98E-09
  17    -11.29558710    -11.29558710  1.05E-09
  18     -9.05796425     -9.05796425  3.44E-09
  19     -7.06929563     -7.06929563  1.40E-09
  20     -3.79741623     -3.79741623  3.13E-10
  21     -3.50121719     -3.50121718  1.96E-09
  22     -0.14678839     -0.14678838  4.34E-09
  23     -0.11604717     -0.11604717  5.56E-09
  24     -1.74803996     -1.74803995  6.32E-09
  25     -1.10111901     -1.10111900  6.28E-09
  26     -0.77578419     -0.77578418  6.33E-09
  27     -0.10304082     -0.10304082  4.15E-09
  28     -0.08480203     -0.08480202  3.90E-09
  29     -0.16094728     -0.16094728  2.24E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.54s user 0.03s system 100% cpu 0.565 total
HaoZeke commented 1 year ago

Machine

image

Also:

Build type: native build
Project name: featom
Project version: 0.1.0
Fortran compiler for the host machine: gfortran (gcc 12.3.0 "GNU Fortran (conda-forge gcc 12.3.0-0) 12.3.0")
Fortran linker for the host machine: gfortran ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency lapack found: YES 3.9.0

meson build commands

FFLAGS='-ffast-math -march=native' meson setup bbdir --buildtype="release" -Dwith_tests=True
meson compile -C bbdir
time ./bbdir/testDftSchroedFast

DFT Schroedinger

./bbdir/testDftSchroedFast  0.02s user 0.00s system 98% cpu 0.021 total

DFT Dirac

These are based on #17 with the patch for the "accurate eigensolver".

# Lapack 3.9.0
./bbdir/testDftDiracFast  0.95s user 0.01s system 99% cpu 0.955 total
# mkl-dynamic-lp64-seq 2023.2
./bbdir/testDftDiracFast  0.67s user 0.01s system 99% cpu 0.679 total

Intel ifort

micromamba install -c hcc ifort_linux-64
FC=$(which ifort) FFLAGS="-O3 -xHost -ipo -no-prec-div -fp-model fast=2" meson setup bbdir -Dwith_tests=True --buildtype="release"

...

Fortran compiler for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort (intel 2021.6.0 "ifort (IFORT) 2021.6.0 20220226")
Fortran linker for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency mkl-dynamic-lp64-seq found: YES 2023.2

./bbdir/testDftDiracFast  0.51s user 0.01s system 99% cpu 0.525 total

Which corresponds to: image

certik commented 1 year ago

To use the Accelerate framework on macOS, one can use:

fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate" test_dft_dirac_fast --verbose

But I am getting similar timing:

$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
[...]
  28     -0.08480203     -0.08480202  3.90E-09
  29     -0.16094728     -0.16094728  2.24E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.54s user 0.03s system 100% cpu 0.565 total

It seems that on macOS even just linking -lblas and -llapack links against Accelerate by default.

certik commented 1 year ago

The dimension is about 240x240 for Dirac, and we only need 7 eigenvalues. Let's use a lapack interface that can return just 7, or use some custom eigensolver that can do it.

certik commented 1 year ago

With #18 I get:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 SCF iteration:          15
 SCF convergence error:   1.6529884305782616E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232635 -28001.13232549  8.65E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902075  -4223.41902046  2.94E-07
   2   -789.48978246   -789.48978233  1.35E-07
   3   -761.37447601   -761.37447597  3.29E-08
   4   -622.84809461   -622.84809456  4.34E-08
   5   -199.42980567   -199.42980564  2.36E-08
   6   -186.66371312   -186.66371312  6.40E-09
   7   -154.70102667   -154.70102667  2.26E-09
   8   -134.54118029   -134.54118029  2.11E-09
   9   -128.01665738   -128.01665738  8.76E-10
  10    -50.78894805    -50.78894806  1.06E-08
  11    -45.03717127    -45.03717129  1.59E-08
  12    -36.68861047    -36.68861049  1.22E-08
  13    -27.52930623    -27.52930624  1.41E-08
  14    -25.98542889    -25.98542891  1.51E-08
  15    -13.88951422    -13.88951423  1.64E-08
  16    -13.48546968    -13.48546969  1.61E-08
  17    -11.29558710    -11.29558710  4.23E-10
  18     -9.05796425     -9.05796425  7.23E-10
  19     -7.06929564     -7.06929563  4.66E-09
  20     -3.79741624     -3.79741623  8.66E-09
  21     -3.50121719     -3.50121718  7.14E-09
  22     -0.14678840     -0.14678838  1.89E-08
  23     -0.11604718     -0.11604717  1.83E-08
  24     -1.74803998     -1.74803995  2.22E-08
  25     -1.10111902     -1.10111900  2.19E-08
  26     -0.77578420     -0.77578418  2.18E-08
  27     -0.10304083     -0.10304082  1.50E-08
  28     -0.08480204     -0.08480202  1.37E-08
  29     -0.16094729     -0.16094728  9.98E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.43s user 0.01s system 99% cpu 0.437 total
HaoZeke commented 1 year ago
❯ vtune -report hotspots -r r000hs -group-by module
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Module            CPU Time  CPU Time:Effective Time  CPU Time:Spin Time  CPU Time:Overhead Time  Module Path                                                      
----------------  --------  -----------------------  ------------------  ----------------------  -----------------------------------------------------------------
libmkl_core.so.2    0.230s                   0.230s                  0s                      0s  /home/rgoswami/micromamba/envs/fe/lib/libmkl_core.so.2           
libfeatom.so        0.120s                   0.120s                  0s                      0s  /home/rgoswami/Git/Github/Fortran/featom/bbdir/src/libfeatom.so  
libc.so.6           0.010s                   0.010s                  0s                      0s  /usr/lib/libc.so.6                                               
libc++abi.so        0.010s                   0.010s                  0s                      0s  /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc++abi.so   
libc-dynamic.so     0.010s                   0.010s                  0s                      0s  /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc-dynamic.so
testDftDiracFast    0.010s                   0.010s                  0s                      0s  /home/rgoswami/Git/Github/Fortran/featom/bbdir/testDftDiracFast  
vtune: Executing actions 100 % done                                            
❯ vtune -report hotspots -r r000hs
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Function                  CPU Time  CPU Time:Effective Time  CPU Time:Spin Time  CPU Time:Overhead Time  Module            Function (Full)              Source File  Start Address
------------------------  --------  -----------------------  ------------------  ----------------------  ----------------  ---------------------------  -----------  -------------
[MKL LAPACK]@dsyevx         0.150s                   0.150s                  0s                      0s  libmkl_core.so.2  mkl_lapack_dsyevx            [Unknown]    0x9c6470     
assemble_radial_dirac_sh    0.060s                   0.060s                  0s                      0s  libfeatom.so      assemble_radial_dirac_sh     fe.f90       0x16a70      
[MKL LAPACK]@dsygst         0.050s                   0.050s                  0s                      0s  libmkl_core.so.2  mkl_lapack_dsygst            [Unknown]    0x9c8230     
phih                        0.040s                   0.040s                  0s                      0s  libfeatom.so      phih                         feutils.f90  0x47f93      
dphih                       0.020s                   0.020s                  0s                      0s  libfeatom.so      dphih                        feutils.f90  0x50278      
MKL_Load_Lib_Ex             0.020s                   0.020s                  0s                      0s  libmkl_core.so.2  MKL_Load_Lib_Ex              [Unknown]    0x21ca50     
free                        0.010s                   0.010s                  0s                      0s  libc.so.6         free                         [Unknown]    0x9d2e0      
[MKL LAPACK]@xdgetrf        0.010s                   0.010s                  0s                      0s  libmkl_core.so.2  mkl_lapack_xdgetrf           [Unknown]    0xd8c050     
memmove                     0.010s                   0.010s                  0s                      0s  libc-dynamic.so   memmove                      [Unknown]    0x69e30      
operator new                0.010s                   0.010s                  0s                      0s  libc++abi.so      operator new(unsigned long)  [Unknown]    0x25000      
__intel_avx_rep_memcpy      0.010s                   0.010s                  0s                      0s  testDftDiracFast  __intel_avx_rep_memcpy       [Unknown]    0x4ac280     
vtune: Executing actions 100 % done                                            
❯ vtune -R callstacks -r r000hs -group-by callstack
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report                              Function/Function Stack        CPU Time  Module                  Function (Full)                Source File              Start Address
-----------------------------  --------  ----------------------  -----------------------------  -----------------------  -------------
[MKL LAPACK]@dsyevx              0.140s  libmkl_core.so.2        mkl_lapack_dsyevx              [Unknown]                0x9c6470     
dsyevx_                              0s  libmkl_intel_lp64.so.2  dsyevx_                        [Unknown]                0x705190     
solve_eig_irange                     0s  libfeatom.so            solve_eig_irange               solvers.f90              0x4a940      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

assemble_radial_dirac_sh         0.060s  libfeatom.so            assemble_radial_dirac_sh       fe.f90                   0x16a70      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

phih                             0.040s  libfeatom.so            phih                           feutils.f90              0x47f93      
fe2quad                              0s  libfeatom.so            fe2quad                        feutils.f90              0x47f00      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

[MKL LAPACK]@dsygst              0.030s  libmkl_core.so.2        mkl_lapack_dsygst              [Unknown]                0x9c8230     
DSYGST                               0s  libmkl_intel_lp64.so.2  DSYGST                         [Unknown]                0x706480     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

dphih                            0.020s  libfeatom.so            dphih                          feutils.f90              0x50278      
assemble_poisson_gj                  0s  libfeatom.so            assemble_poisson_gj            hartree_screening.f90    0x4fe70      
hartree_potential_gj                 0s  libfeatom.so            hartree_potential_gj           hartree_screening.f90    0x4d130      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

MKL_Load_Lib_Ex                  0.020s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

[MKL LAPACK]@dsygst              0.020s  libmkl_core.so.2        mkl_lapack_dsygst              [Unknown]                0x9c8230     
DSYGST                               0s  libmkl_intel_lp64.so.2  DSYGST                         [Unknown]                0x706480     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
[Unknown stack frame(s)]             0s  [Unknown]               [Unknown stack frame(s)]       [Unknown]                0            
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

free                             0.010s  libc.so.6               free                           [Unknown]                0x9d2e0      
for_dealloc_allocatable              0s  testDftDiracFast        for_dealloc_allocatable        [Unknown]                0x439650     
inv                                  0s  libfeatom.so            inv                            linalg.f90               0x1ea50      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

[MKL LAPACK]@xdgetrf             0.010s  libmkl_core.so.2        mkl_lapack_xdgetrf             [Unknown]                0xd8c050     
DGETRF                               0s  libmkl_intel_lp64.so.2  DGETRF                         [Unknown]                0x623590     
inv                                  0s  libfeatom.so            inv                            linalg.f90               0x1ea50      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

memmove                          0.010s  libc-dynamic.so         memmove                        [Unknown]                0x69e30      
memcpy                               0s  libc-dynamic.so         memcpy                         [Unknown]                0x69d30      
MKL_Load_Lib_Ex                      0s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

operator new                     0.010s  libc++abi.so            operator new(unsigned long)    [Unknown]                0x25000      
MKL_Load_Lib_Ex                      0s  libmkl_core.so.2        MKL_Load_Lib_Ex                [Unknown]                0x21ca50     
__mkl_cpu_detect_and_load_dll        0s  libmkl_core.so.2        __mkl_cpu_detect_and_load_dll  [Unknown]                0x21be50     
[MKL LAPACK]@dsteqr                  0s  libmkl_core.so.2        mkl_lapack_dsteqr              [Unknown]                0x9ba900     
DSTEQR                               0s  libmkl_intel_lp64.so.2  DSTEQR                         [Unknown]                0x6ff3d0     
gauss_jacobi_gw                      0s  libgjp_gw.so            gauss_jacobi_gw                gjp_gw_single.f90        0x64e0       
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

[MKL LAPACK]@dsyevx              0.010s  libmkl_core.so.2        mkl_lapack_dsyevx              [Unknown]                0x9c6470     
dsyevx_                              0s  libmkl_intel_lp64.so.2  dsyevx_                        [Unknown]                0x705190     
solve_eig_irange                     0s  libfeatom.so            solve_eig_irange               solvers.f90              0x4a940      
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
[Unknown stack frame(s)]             0s  [Unknown]               [Unknown stack frame(s)]       [Unknown]                0            
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

__intel_avx_rep_memcpy           0.010s  testDftDiracFast        __intel_avx_rep_memcpy         [Unknown]                0x4ac280     
solve_dirac_eigenproblem             0s  libfeatom.so            solve_dirac_eigenproblem       dirac.f90                0x433b0      
diracsolve_dirac_mp_ffunc_           0s  libfeatom.so            diracsolve_dirac_mp_ffunc_     dirac.f90                0xe5e0       
mixing_pulay                         0s  libfeatom.so            mixing_pulay                   mixings.f90              0x3ec40      
solve_dirac                          0s  libfeatom.so            solve_dirac                    dirac.f90                0x4930       
test_dft_dirac_fast                  0s  testDftDiracFast        test_dft_dirac_fast            test_dft_dirac_fast.f90  0x40b860     
main                                 0s  testDftDiracFast        main                           [Unknown]                0x40b810     
__libc_start_main                    0s  libc.so.6               __libc_start_main              [Unknown]                0x27d00      
_start                               0s  testDftDiracFast        _start                         [Unknown]                0x40b740     
[stack]                              0s  [stack]                 [stack]                        [Unknown]                0            

vtune: Executing actions 100 % done                                      
certik commented 12 months ago

With the latest commit I get:

$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220424374379945     
 SCF iteration:           8
 SCF convergence error:   5.9356682395446114E-003
 SCF iteration:           9
 SCF convergence error:   1.8809154171322007E-003
 SCF iteration:          10
 SCF convergence error:   1.0443680002936162E-004
 SCF iteration:          11
 SCF convergence error:   3.1862826290307567E-005
 SCF iteration:          12
 SCF convergence error:   1.1619773431448266E-005
 SCF iteration:          13
 SCF convergence error:   1.4270481187850237E-006
 SCF iteration:          14
 SCF convergence error:   1.3223652786109596E-006
 SCF iteration:          15
 SCF convergence error:   2.4143082555383444E-007
 SCF iteration:          16
 SCF convergence error:   5.5670170695520937E-008
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232562 -28001.13232549  1.35E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902055  -4223.41902046  9.11E-08
   2   -789.48978235   -789.48978233  2.46E-08
   3   -761.37447600   -761.37447597  3.10E-08
   4   -622.84809459   -622.84809456  2.44E-08
   5   -199.42980565   -199.42980564  6.82E-09
   6   -186.66371313   -186.66371312  9.53E-09
   7   -154.70102668   -154.70102667  5.38E-09
   8   -134.54118030   -134.54118029  9.35E-09
   9   -128.01665739   -128.01665738  8.53E-09
  10    -50.78894806    -50.78894806  3.60E-09
  11    -45.03717129    -45.03717129  1.41E-09
  12    -36.68861048    -36.68861049  3.75E-09
  13    -27.52930624    -27.52930624  2.23E-09
  14    -25.98542890    -25.98542891  2.56E-09
  15    -13.88951423    -13.88951423  3.29E-09
  16    -13.48546969    -13.48546969  2.25E-09
  17    -11.29558710    -11.29558710  6.12E-10
  18     -9.05796425     -9.05796425  2.79E-10
  19     -7.06929564     -7.06929563  8.73E-10
  20     -3.79741623     -3.79741623  2.07E-09
  21     -3.50121719     -3.50121718  2.69E-09
  22     -0.14678839     -0.14678838  6.58E-09
  23     -0.11604717     -0.11604717  6.89E-09
  24     -1.74803996     -1.74803995  8.23E-09
  25     -1.10111901     -1.10111900  8.43E-09
  26     -0.77578419     -0.77578418  9.08E-09
  27     -0.10304082     -0.10304082  6.28E-09
  28     -0.08480203     -0.08480202  6.48E-09
  29     -0.16094729     -0.16094728  4.73E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast  0.49s user 0.10s system 117% cpu 0.497 total
certik commented 12 months ago

featom using 310aeb863600e784dcf1a04ac9ec39b4419b97d2:

$ fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate " test_dft_dirac_fast --verbose
$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 SCF iteration:          15
 SCF convergence error:   1.6529884305782616E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232635 -28001.13232549  8.65E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902075  -4223.41902046  2.94E-07
   2   -789.48978246   -789.48978233  1.35E-07
   3   -761.37447601   -761.37447597  3.29E-08
   4   -622.84809461   -622.84809456  4.34E-08
   5   -199.42980567   -199.42980564  2.36E-08
   6   -186.66371312   -186.66371312  6.40E-09
   7   -154.70102667   -154.70102667  2.26E-09
   8   -134.54118029   -134.54118029  2.11E-09
   9   -128.01665738   -128.01665738  8.76E-10
  10    -50.78894805    -50.78894806  1.06E-08
  11    -45.03717127    -45.03717129  1.59E-08
  12    -36.68861047    -36.68861049  1.22E-08
  13    -27.52930623    -27.52930624  1.41E-08
  14    -25.98542889    -25.98542891  1.51E-08
  15    -13.88951422    -13.88951423  1.64E-08
  16    -13.48546968    -13.48546969  1.61E-08
  17    -11.29558710    -11.29558710  4.23E-10
  18     -9.05796425     -9.05796425  7.23E-10
  19     -7.06929564     -7.06929563  4.66E-09
  20     -3.79741624     -3.79741623  8.66E-09
  21     -3.50121719     -3.50121718  7.14E-09
  22     -0.14678840     -0.14678838  1.89E-08
  23     -0.11604718     -0.11604717  1.83E-08
  24     -1.74803998     -1.74803995  2.22E-08
  25     -1.10111902     -1.10111900  2.19E-08
  26     -0.77578420     -0.77578418  2.18E-08
  27     -0.10304083     -0.10304082  1.50E-08
  28     -0.08480204     -0.08480202  1.37E-08
  29     -0.16094729     -0.16094728  9.98E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.41s user 0.01s system 99% cpu 0.420 total

dftatom:

$ time ./tests/atom_U/uraninum_rlda
 Test eps:   1.1999999999999999E-006
 Z=          92
 N=        5269
E_tot= -28001.13232639 E_tot_exact= -28001.13232549 error:  9.00E-07
 state    E            E_exact          error     occupancy
1s   -4223.41902044  -4223.41902046 -1.83E-08    2.000
2s    -789.48978232   -789.48978233 -1.16E-08    2.000
2p    -761.37447596   -761.37447597 -1.35E-08    2.000
2p    -622.84809453   -622.84809456 -3.60E-08    4.000
3s    -199.42980566   -199.42980564  1.07E-08    2.000
3p    -186.66371314   -186.66371312  1.15E-08    2.000
3p    -154.70102665   -154.70102667 -2.03E-08    4.000
3d    -134.54118027   -134.54118029 -1.93E-08    4.000
3d    -128.01665735   -128.01665738 -3.18E-08    6.000
4s     -50.78894808    -50.78894806  1.89E-08    2.000
4p     -45.03717131    -45.03717129  1.98E-08    2.000
4p     -36.68861048    -36.68861049 -4.93E-09    4.000
4d     -27.52930624    -27.52930624 -3.22E-09    4.000
4d     -25.98542889    -25.98542891 -1.85E-08    6.000
4f     -13.88951422    -13.88951423 -1.70E-08    6.000
4f     -13.48546967    -13.48546969 -2.00E-08    8.000
5s     -11.29558711    -11.29558710  1.37E-08    2.000
5p      -9.05796426     -9.05796425  1.32E-08    2.000
5p      -7.06929564     -7.06929563  7.97E-10    4.000
5d      -3.79741623     -3.79741623  1.17E-09    4.000
5d      -3.50121718     -3.50121718 -5.77E-09    6.000
5f      -0.14678838     -0.14678838 -2.39E-09    1.286
5f      -0.11604716     -0.11604717 -3.17E-09    1.714
6s      -1.74803996     -1.74803995  5.41E-09    2.000
6p      -1.10111900     -1.10111900  4.31E-09    2.000
6p      -0.77578418     -0.77578418  8.61E-10    4.000
6d      -0.10304082     -0.10304082  3.74E-10    0.400
6d      -0.08480202     -0.08480202 -2.54E-10    0.600
7s      -0.16094728     -0.16094728  1.06E-09    2.000
./tests/atom_U/uraninum_rlda  0.27s user 0.01s system 98% cpu 0.277 total
certik commented 12 months ago

With cedfa6abe52d5dcbcd8dde244cf346f313483e6e

$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
 SCF iteration:           1
 SCF iteration:           2
 SCF iteration:           3
 SCF iteration:           4
 SCF iteration:           5
 SCF iteration:           6
 SCF iteration:           7
 SCF convergence error:   2.4220418857676123     
 SCF iteration:           8
 SCF convergence error:   5.9357534701121040E-003
 SCF iteration:           9
 SCF convergence error:   1.8806871048582252E-003
 SCF iteration:          10
 SCF convergence error:   1.0455984192958567E-004
 SCF iteration:          11
 SCF convergence error:   3.2101015676744282E-005
 SCF iteration:          12
 SCF convergence error:   1.1781998182414100E-005
 SCF iteration:          13
 SCF convergence error:   1.2883829185739160E-006
 SCF iteration:          14
 SCF convergence error:   8.7908847490325570E-007
 Comparison of calculated and reference energies

 Total energy:
               E           E_ref     error
 -28001.13232613 -28001.13232549  6.45E-07

 Eigenvalues:
   n               E           E_ref     error
   1  -4223.41902078  -4223.41902046  3.21E-07
   2   -789.48978230   -789.48978233  3.07E-08
   3   -761.37447596   -761.37447597  1.65E-08
   4   -622.84809453   -622.84809456  3.38E-08
   5   -199.42980561   -199.42980564  3.01E-08
   6   -186.66371306   -186.66371312  6.59E-08
   7   -154.70102661   -154.70102667  6.81E-08
   8   -134.54118022   -134.54118029  6.72E-08
   9   -128.01665731   -128.01665738  7.02E-08
  10    -50.78894802    -50.78894806  4.95E-08
  11    -45.03717123    -45.03717129  5.97E-08
  12    -36.68861043    -36.68861049  5.96E-08
  13    -27.52930618    -27.52930624  5.95E-08
  14    -25.98542885    -25.98542891  5.75E-08
  15    -13.88951417    -13.88951423  6.03E-08
  16    -13.48546963    -13.48546969  6.33E-08
  17    -11.29558706    -11.29558710  4.17E-08
  18     -9.05796421     -9.05796425  4.43E-08
  19     -7.06929559     -7.06929563  4.12E-08
  20     -3.79741619     -3.79741623  3.41E-08
  21     -3.50121715     -3.50121718  3.41E-08
  22     -0.14678836     -0.14678838  2.99E-08
  23     -0.11604714     -0.11604717  2.92E-08
  24     -1.74803993     -1.74803995  2.87E-08
  25     -1.10111897     -1.10111900  3.12E-08
  26     -0.77578414     -0.77578418  3.45E-08
  27     -0.10304078     -0.10304082  3.04E-08
  28     -0.08480199     -0.08480202  3.05E-08
  29     -0.16094726     -0.16094728  2.65E-08
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast  0.40s user 0.01s system 99% cpu 0.404 total