Open certik opened 1 year ago
Also:
Build type: native build
Project name: featom
Project version: 0.1.0
Fortran compiler for the host machine: gfortran (gcc 12.3.0 "GNU Fortran (conda-forge gcc 12.3.0-0) 12.3.0")
Fortran linker for the host machine: gfortran ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency lapack found: YES 3.9.0
meson
build commandsFFLAGS='-ffast-math -march=native' meson setup bbdir --buildtype="release" -Dwith_tests=True
meson compile -C bbdir
time ./bbdir/testDftSchroedFast
./bbdir/testDftSchroedFast 0.02s user 0.00s system 98% cpu 0.021 total
These are based on #17 with the patch for the "accurate eigensolver".
# Lapack 3.9.0
./bbdir/testDftDiracFast 0.95s user 0.01s system 99% cpu 0.955 total
# mkl-dynamic-lp64-seq 2023.2
./bbdir/testDftDiracFast 0.67s user 0.01s system 99% cpu 0.679 total
ifort
micromamba install -c hcc ifort_linux-64
FC=$(which ifort) FFLAGS="-O3 -xHost -ipo -no-prec-div -fp-model fast=2" meson setup bbdir -Dwith_tests=True --buildtype="release"
...
Fortran compiler for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort (intel 2021.6.0 "ifort (IFORT) 2021.6.0 20220226")
Fortran linker for the host machine: /home/rgoswami/micromamba/envs/fe/bin/ifort ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Found pkg-config: /home/rgoswami/micromamba/envs/fe/bin/pkg-config (0.29.2)
Run-time dependency mkl-dynamic-lp64-seq found: YES 2023.2
./bbdir/testDftDiracFast 0.51s user 0.01s system 99% cpu 0.525 total
Which corresponds to:
To use the Accelerate framework on macOS, one can use:
fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate" test_dft_dirac_fast --verbose
But I am getting similar timing:
$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
SCF iteration: 1
[...]
28 -0.08480203 -0.08480202 3.90E-09
29 -0.16094728 -0.16094728 2.24E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast 0.54s user 0.03s system 100% cpu 0.565 total
It seems that on macOS even just linking -lblas
and -llapack
links against Accelerate by default.
The dimension is about 240x240 for Dirac, and we only need 7 eigenvalues. Let's use a lapack interface that can return just 7, or use some custom eigensolver that can do it.
With #18 I get:
$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
SCF iteration: 1
SCF iteration: 2
SCF iteration: 3
SCF iteration: 4
SCF iteration: 5
SCF iteration: 6
SCF iteration: 7
SCF convergence error: 2.4220418857676123
SCF iteration: 8
SCF convergence error: 5.9357534701121040E-003
SCF iteration: 9
SCF convergence error: 1.8806871048582252E-003
SCF iteration: 10
SCF convergence error: 1.0455984192958567E-004
SCF iteration: 11
SCF convergence error: 3.2101015676744282E-005
SCF iteration: 12
SCF convergence error: 1.1781998182414100E-005
SCF iteration: 13
SCF convergence error: 1.2883829185739160E-006
SCF iteration: 14
SCF convergence error: 8.7908847490325570E-007
SCF iteration: 15
SCF convergence error: 1.6529884305782616E-007
Comparison of calculated and reference energies
Total energy:
E E_ref error
-28001.13232635 -28001.13232549 8.65E-07
Eigenvalues:
n E E_ref error
1 -4223.41902075 -4223.41902046 2.94E-07
2 -789.48978246 -789.48978233 1.35E-07
3 -761.37447601 -761.37447597 3.29E-08
4 -622.84809461 -622.84809456 4.34E-08
5 -199.42980567 -199.42980564 2.36E-08
6 -186.66371312 -186.66371312 6.40E-09
7 -154.70102667 -154.70102667 2.26E-09
8 -134.54118029 -134.54118029 2.11E-09
9 -128.01665738 -128.01665738 8.76E-10
10 -50.78894805 -50.78894806 1.06E-08
11 -45.03717127 -45.03717129 1.59E-08
12 -36.68861047 -36.68861049 1.22E-08
13 -27.52930623 -27.52930624 1.41E-08
14 -25.98542889 -25.98542891 1.51E-08
15 -13.88951422 -13.88951423 1.64E-08
16 -13.48546968 -13.48546969 1.61E-08
17 -11.29558710 -11.29558710 4.23E-10
18 -9.05796425 -9.05796425 7.23E-10
19 -7.06929564 -7.06929563 4.66E-09
20 -3.79741624 -3.79741623 8.66E-09
21 -3.50121719 -3.50121718 7.14E-09
22 -0.14678840 -0.14678838 1.89E-08
23 -0.11604718 -0.11604717 1.83E-08
24 -1.74803998 -1.74803995 2.22E-08
25 -1.10111902 -1.10111900 2.19E-08
26 -0.77578420 -0.77578418 2.18E-08
27 -0.10304083 -0.10304082 1.50E-08
28 -0.08480204 -0.08480202 1.37E-08
29 -0.16094729 -0.16094728 9.98E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast 0.43s user 0.01s system 99% cpu 0.437 total
❯ vtune -report hotspots -r r000hs -group-by module
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report Module CPU Time CPU Time:Effective Time CPU Time:Spin Time CPU Time:Overhead Time Module Path
---------------- -------- ----------------------- ------------------ ---------------------- -----------------------------------------------------------------
libmkl_core.so.2 0.230s 0.230s 0s 0s /home/rgoswami/micromamba/envs/fe/lib/libmkl_core.so.2
libfeatom.so 0.120s 0.120s 0s 0s /home/rgoswami/Git/Github/Fortran/featom/bbdir/src/libfeatom.so
libc.so.6 0.010s 0.010s 0s 0s /usr/lib/libc.so.6
libc++abi.so 0.010s 0.010s 0s 0s /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc++abi.so
libc-dynamic.so 0.010s 0.010s 0s 0s /opt/intel/oneapi/vtune/2023.2.0/lib64/pinruntime/libc-dynamic.so
testDftDiracFast 0.010s 0.010s 0s 0s /home/rgoswami/Git/Github/Fortran/featom/bbdir/testDftDiracFast
vtune: Executing actions 100 % done
❯ vtune -report hotspots -r r000hs
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report Function CPU Time CPU Time:Effective Time CPU Time:Spin Time CPU Time:Overhead Time Module Function (Full) Source File Start Address
------------------------ -------- ----------------------- ------------------ ---------------------- ---------------- --------------------------- ----------- -------------
[MKL LAPACK]@dsyevx 0.150s 0.150s 0s 0s libmkl_core.so.2 mkl_lapack_dsyevx [Unknown] 0x9c6470
assemble_radial_dirac_sh 0.060s 0.060s 0s 0s libfeatom.so assemble_radial_dirac_sh fe.f90 0x16a70
[MKL LAPACK]@dsygst 0.050s 0.050s 0s 0s libmkl_core.so.2 mkl_lapack_dsygst [Unknown] 0x9c8230
phih 0.040s 0.040s 0s 0s libfeatom.so phih feutils.f90 0x47f93
dphih 0.020s 0.020s 0s 0s libfeatom.so dphih feutils.f90 0x50278
MKL_Load_Lib_Ex 0.020s 0.020s 0s 0s libmkl_core.so.2 MKL_Load_Lib_Ex [Unknown] 0x21ca50
free 0.010s 0.010s 0s 0s libc.so.6 free [Unknown] 0x9d2e0
[MKL LAPACK]@xdgetrf 0.010s 0.010s 0s 0s libmkl_core.so.2 mkl_lapack_xdgetrf [Unknown] 0xd8c050
memmove 0.010s 0.010s 0s 0s libc-dynamic.so memmove [Unknown] 0x69e30
operator new 0.010s 0.010s 0s 0s libc++abi.so operator new(unsigned long) [Unknown] 0x25000
__intel_avx_rep_memcpy 0.010s 0.010s 0s 0s testDftDiracFast __intel_avx_rep_memcpy [Unknown] 0x4ac280
vtune: Executing actions 100 % done
❯ vtune -R callstacks -r r000hs -group-by callstack
vtune: Using result path `/home/rgoswami/Git/Github/Fortran/featom/r000hs'
vtune: Executing actions 75 % Generating a report Function/Function Stack CPU Time Module Function (Full) Source File Start Address
----------------------------- -------- ---------------------- ----------------------------- ----------------------- -------------
[MKL LAPACK]@dsyevx 0.140s libmkl_core.so.2 mkl_lapack_dsyevx [Unknown] 0x9c6470
dsyevx_ 0s libmkl_intel_lp64.so.2 dsyevx_ [Unknown] 0x705190
solve_eig_irange 0s libfeatom.so solve_eig_irange solvers.f90 0x4a940
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
assemble_radial_dirac_sh 0.060s libfeatom.so assemble_radial_dirac_sh fe.f90 0x16a70
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
phih 0.040s libfeatom.so phih feutils.f90 0x47f93
fe2quad 0s libfeatom.so fe2quad feutils.f90 0x47f00
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
[MKL LAPACK]@dsygst 0.030s libmkl_core.so.2 mkl_lapack_dsygst [Unknown] 0x9c8230
DSYGST 0s libmkl_intel_lp64.so.2 DSYGST [Unknown] 0x706480
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
dphih 0.020s libfeatom.so dphih feutils.f90 0x50278
assemble_poisson_gj 0s libfeatom.so assemble_poisson_gj hartree_screening.f90 0x4fe70
hartree_potential_gj 0s libfeatom.so hartree_potential_gj hartree_screening.f90 0x4d130
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
MKL_Load_Lib_Ex 0.020s libmkl_core.so.2 MKL_Load_Lib_Ex [Unknown] 0x21ca50
__mkl_cpu_detect_and_load_dll 0s libmkl_core.so.2 __mkl_cpu_detect_and_load_dll [Unknown] 0x21be50
[MKL LAPACK]@dsteqr 0s libmkl_core.so.2 mkl_lapack_dsteqr [Unknown] 0x9ba900
DSTEQR 0s libmkl_intel_lp64.so.2 DSTEQR [Unknown] 0x6ff3d0
gauss_jacobi_gw 0s libgjp_gw.so gauss_jacobi_gw gjp_gw_single.f90 0x64e0
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
[MKL LAPACK]@dsygst 0.020s libmkl_core.so.2 mkl_lapack_dsygst [Unknown] 0x9c8230
DSYGST 0s libmkl_intel_lp64.so.2 DSYGST [Unknown] 0x706480
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
[Unknown stack frame(s)] 0s [Unknown] [Unknown stack frame(s)] [Unknown] 0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
free 0.010s libc.so.6 free [Unknown] 0x9d2e0
for_dealloc_allocatable 0s testDftDiracFast for_dealloc_allocatable [Unknown] 0x439650
inv 0s libfeatom.so inv linalg.f90 0x1ea50
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
[MKL LAPACK]@xdgetrf 0.010s libmkl_core.so.2 mkl_lapack_xdgetrf [Unknown] 0xd8c050
DGETRF 0s libmkl_intel_lp64.so.2 DGETRF [Unknown] 0x623590
inv 0s libfeatom.so inv linalg.f90 0x1ea50
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
memmove 0.010s libc-dynamic.so memmove [Unknown] 0x69e30
memcpy 0s libc-dynamic.so memcpy [Unknown] 0x69d30
MKL_Load_Lib_Ex 0s libmkl_core.so.2 MKL_Load_Lib_Ex [Unknown] 0x21ca50
__mkl_cpu_detect_and_load_dll 0s libmkl_core.so.2 __mkl_cpu_detect_and_load_dll [Unknown] 0x21be50
[MKL LAPACK]@dsteqr 0s libmkl_core.so.2 mkl_lapack_dsteqr [Unknown] 0x9ba900
DSTEQR 0s libmkl_intel_lp64.so.2 DSTEQR [Unknown] 0x6ff3d0
gauss_jacobi_gw 0s libgjp_gw.so gauss_jacobi_gw gjp_gw_single.f90 0x64e0
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
operator new 0.010s libc++abi.so operator new(unsigned long) [Unknown] 0x25000
MKL_Load_Lib_Ex 0s libmkl_core.so.2 MKL_Load_Lib_Ex [Unknown] 0x21ca50
__mkl_cpu_detect_and_load_dll 0s libmkl_core.so.2 __mkl_cpu_detect_and_load_dll [Unknown] 0x21be50
[MKL LAPACK]@dsteqr 0s libmkl_core.so.2 mkl_lapack_dsteqr [Unknown] 0x9ba900
DSTEQR 0s libmkl_intel_lp64.so.2 DSTEQR [Unknown] 0x6ff3d0
gauss_jacobi_gw 0s libgjp_gw.so gauss_jacobi_gw gjp_gw_single.f90 0x64e0
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
[MKL LAPACK]@dsyevx 0.010s libmkl_core.so.2 mkl_lapack_dsyevx [Unknown] 0x9c6470
dsyevx_ 0s libmkl_intel_lp64.so.2 dsyevx_ [Unknown] 0x705190
solve_eig_irange 0s libfeatom.so solve_eig_irange solvers.f90 0x4a940
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
[Unknown stack frame(s)] 0s [Unknown] [Unknown stack frame(s)] [Unknown] 0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
__intel_avx_rep_memcpy 0.010s testDftDiracFast __intel_avx_rep_memcpy [Unknown] 0x4ac280
solve_dirac_eigenproblem 0s libfeatom.so solve_dirac_eigenproblem dirac.f90 0x433b0
diracsolve_dirac_mp_ffunc_ 0s libfeatom.so diracsolve_dirac_mp_ffunc_ dirac.f90 0xe5e0
mixing_pulay 0s libfeatom.so mixing_pulay mixings.f90 0x3ec40
solve_dirac 0s libfeatom.so solve_dirac dirac.f90 0x4930
test_dft_dirac_fast 0s testDftDiracFast test_dft_dirac_fast test_dft_dirac_fast.f90 0x40b860
main 0s testDftDiracFast main [Unknown] 0x40b810
__libc_start_main 0s libc.so.6 __libc_start_main [Unknown] 0x27d00
_start 0s testDftDiracFast _start [Unknown] 0x40b740
[stack] 0s [stack] [stack] [Unknown] 0
vtune: Executing actions 100 % done
With the latest commit I get:
$ time build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast
SCF iteration: 1
SCF iteration: 2
SCF iteration: 3
SCF iteration: 4
SCF iteration: 5
SCF iteration: 6
SCF iteration: 7
SCF convergence error: 2.4220424374379945
SCF iteration: 8
SCF convergence error: 5.9356682395446114E-003
SCF iteration: 9
SCF convergence error: 1.8809154171322007E-003
SCF iteration: 10
SCF convergence error: 1.0443680002936162E-004
SCF iteration: 11
SCF convergence error: 3.1862826290307567E-005
SCF iteration: 12
SCF convergence error: 1.1619773431448266E-005
SCF iteration: 13
SCF convergence error: 1.4270481187850237E-006
SCF iteration: 14
SCF convergence error: 1.3223652786109596E-006
SCF iteration: 15
SCF convergence error: 2.4143082555383444E-007
SCF iteration: 16
SCF convergence error: 5.5670170695520937E-008
Comparison of calculated and reference energies
Total energy:
E E_ref error
-28001.13232562 -28001.13232549 1.35E-07
Eigenvalues:
n E E_ref error
1 -4223.41902055 -4223.41902046 9.11E-08
2 -789.48978235 -789.48978233 2.46E-08
3 -761.37447600 -761.37447597 3.10E-08
4 -622.84809459 -622.84809456 2.44E-08
5 -199.42980565 -199.42980564 6.82E-09
6 -186.66371313 -186.66371312 9.53E-09
7 -154.70102668 -154.70102667 5.38E-09
8 -134.54118030 -134.54118029 9.35E-09
9 -128.01665739 -128.01665738 8.53E-09
10 -50.78894806 -50.78894806 3.60E-09
11 -45.03717129 -45.03717129 1.41E-09
12 -36.68861048 -36.68861049 3.75E-09
13 -27.52930624 -27.52930624 2.23E-09
14 -25.98542890 -25.98542891 2.56E-09
15 -13.88951423 -13.88951423 3.29E-09
16 -13.48546969 -13.48546969 2.25E-09
17 -11.29558710 -11.29558710 6.12E-10
18 -9.05796425 -9.05796425 2.79E-10
19 -7.06929564 -7.06929563 8.73E-10
20 -3.79741623 -3.79741623 2.07E-09
21 -3.50121719 -3.50121718 2.69E-09
22 -0.14678839 -0.14678838 6.58E-09
23 -0.11604717 -0.11604717 6.89E-09
24 -1.74803996 -1.74803995 8.23E-09
25 -1.10111901 -1.10111900 8.43E-09
26 -0.77578419 -0.77578418 9.08E-09
27 -0.10304082 -0.10304082 6.28E-09
28 -0.08480203 -0.08480202 6.48E-09
29 -0.16094729 -0.16094728 4.73E-09
build/gfortran_565E65E7876A06C6/test/test_dft_dirac_fast 0.49s user 0.10s system 117% cpu 0.497 total
featom using 310aeb863600e784dcf1a04ac9ec39b4419b97d2:
$ fpm test --profile=release --flag "-ffast-math -march=native -framework Accelerate " test_dft_dirac_fast --verbose
$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
SCF iteration: 1
SCF iteration: 2
SCF iteration: 3
SCF iteration: 4
SCF iteration: 5
SCF iteration: 6
SCF iteration: 7
SCF convergence error: 2.4220418857676123
SCF iteration: 8
SCF convergence error: 5.9357534701121040E-003
SCF iteration: 9
SCF convergence error: 1.8806871048582252E-003
SCF iteration: 10
SCF convergence error: 1.0455984192958567E-004
SCF iteration: 11
SCF convergence error: 3.2101015676744282E-005
SCF iteration: 12
SCF convergence error: 1.1781998182414100E-005
SCF iteration: 13
SCF convergence error: 1.2883829185739160E-006
SCF iteration: 14
SCF convergence error: 8.7908847490325570E-007
SCF iteration: 15
SCF convergence error: 1.6529884305782616E-007
Comparison of calculated and reference energies
Total energy:
E E_ref error
-28001.13232635 -28001.13232549 8.65E-07
Eigenvalues:
n E E_ref error
1 -4223.41902075 -4223.41902046 2.94E-07
2 -789.48978246 -789.48978233 1.35E-07
3 -761.37447601 -761.37447597 3.29E-08
4 -622.84809461 -622.84809456 4.34E-08
5 -199.42980567 -199.42980564 2.36E-08
6 -186.66371312 -186.66371312 6.40E-09
7 -154.70102667 -154.70102667 2.26E-09
8 -134.54118029 -134.54118029 2.11E-09
9 -128.01665738 -128.01665738 8.76E-10
10 -50.78894805 -50.78894806 1.06E-08
11 -45.03717127 -45.03717129 1.59E-08
12 -36.68861047 -36.68861049 1.22E-08
13 -27.52930623 -27.52930624 1.41E-08
14 -25.98542889 -25.98542891 1.51E-08
15 -13.88951422 -13.88951423 1.64E-08
16 -13.48546968 -13.48546969 1.61E-08
17 -11.29558710 -11.29558710 4.23E-10
18 -9.05796425 -9.05796425 7.23E-10
19 -7.06929564 -7.06929563 4.66E-09
20 -3.79741624 -3.79741623 8.66E-09
21 -3.50121719 -3.50121718 7.14E-09
22 -0.14678840 -0.14678838 1.89E-08
23 -0.11604718 -0.11604717 1.83E-08
24 -1.74803998 -1.74803995 2.22E-08
25 -1.10111902 -1.10111900 2.19E-08
26 -0.77578420 -0.77578418 2.18E-08
27 -0.10304083 -0.10304082 1.50E-08
28 -0.08480204 -0.08480202 1.37E-08
29 -0.16094729 -0.16094728 9.98E-09
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast 0.41s user 0.01s system 99% cpu 0.420 total
dftatom:
$ time ./tests/atom_U/uraninum_rlda
Test eps: 1.1999999999999999E-006
Z= 92
N= 5269
E_tot= -28001.13232639 E_tot_exact= -28001.13232549 error: 9.00E-07
state E E_exact error occupancy
1s -4223.41902044 -4223.41902046 -1.83E-08 2.000
2s -789.48978232 -789.48978233 -1.16E-08 2.000
2p -761.37447596 -761.37447597 -1.35E-08 2.000
2p -622.84809453 -622.84809456 -3.60E-08 4.000
3s -199.42980566 -199.42980564 1.07E-08 2.000
3p -186.66371314 -186.66371312 1.15E-08 2.000
3p -154.70102665 -154.70102667 -2.03E-08 4.000
3d -134.54118027 -134.54118029 -1.93E-08 4.000
3d -128.01665735 -128.01665738 -3.18E-08 6.000
4s -50.78894808 -50.78894806 1.89E-08 2.000
4p -45.03717131 -45.03717129 1.98E-08 2.000
4p -36.68861048 -36.68861049 -4.93E-09 4.000
4d -27.52930624 -27.52930624 -3.22E-09 4.000
4d -25.98542889 -25.98542891 -1.85E-08 6.000
4f -13.88951422 -13.88951423 -1.70E-08 6.000
4f -13.48546967 -13.48546969 -2.00E-08 8.000
5s -11.29558711 -11.29558710 1.37E-08 2.000
5p -9.05796426 -9.05796425 1.32E-08 2.000
5p -7.06929564 -7.06929563 7.97E-10 4.000
5d -3.79741623 -3.79741623 1.17E-09 4.000
5d -3.50121718 -3.50121718 -5.77E-09 6.000
5f -0.14678838 -0.14678838 -2.39E-09 1.286
5f -0.11604716 -0.11604717 -3.17E-09 1.714
6s -1.74803996 -1.74803995 5.41E-09 2.000
6p -1.10111900 -1.10111900 4.31E-09 2.000
6p -0.77578418 -0.77578418 8.61E-10 4.000
6d -0.10304082 -0.10304082 3.74E-10 0.400
6d -0.08480202 -0.08480202 -2.54E-10 0.600
7s -0.16094728 -0.16094728 1.06E-09 2.000
./tests/atom_U/uraninum_rlda 0.27s user 0.01s system 98% cpu 0.277 total
With cedfa6abe52d5dcbcd8dde244cf346f313483e6e
$ time build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast
SCF iteration: 1
SCF iteration: 2
SCF iteration: 3
SCF iteration: 4
SCF iteration: 5
SCF iteration: 6
SCF iteration: 7
SCF convergence error: 2.4220418857676123
SCF iteration: 8
SCF convergence error: 5.9357534701121040E-003
SCF iteration: 9
SCF convergence error: 1.8806871048582252E-003
SCF iteration: 10
SCF convergence error: 1.0455984192958567E-004
SCF iteration: 11
SCF convergence error: 3.2101015676744282E-005
SCF iteration: 12
SCF convergence error: 1.1781998182414100E-005
SCF iteration: 13
SCF convergence error: 1.2883829185739160E-006
SCF iteration: 14
SCF convergence error: 8.7908847490325570E-007
Comparison of calculated and reference energies
Total energy:
E E_ref error
-28001.13232613 -28001.13232549 6.45E-07
Eigenvalues:
n E E_ref error
1 -4223.41902078 -4223.41902046 3.21E-07
2 -789.48978230 -789.48978233 3.07E-08
3 -761.37447596 -761.37447597 1.65E-08
4 -622.84809453 -622.84809456 3.38E-08
5 -199.42980561 -199.42980564 3.01E-08
6 -186.66371306 -186.66371312 6.59E-08
7 -154.70102661 -154.70102667 6.81E-08
8 -134.54118022 -134.54118029 6.72E-08
9 -128.01665731 -128.01665738 7.02E-08
10 -50.78894802 -50.78894806 4.95E-08
11 -45.03717123 -45.03717129 5.97E-08
12 -36.68861043 -36.68861049 5.96E-08
13 -27.52930618 -27.52930624 5.95E-08
14 -25.98542885 -25.98542891 5.75E-08
15 -13.88951417 -13.88951423 6.03E-08
16 -13.48546963 -13.48546969 6.33E-08
17 -11.29558706 -11.29558710 4.17E-08
18 -9.05796421 -9.05796425 4.43E-08
19 -7.06929559 -7.06929563 4.12E-08
20 -3.79741619 -3.79741623 3.41E-08
21 -3.50121715 -3.50121718 3.41E-08
22 -0.14678836 -0.14678838 2.99E-08
23 -0.11604714 -0.11604717 2.92E-08
24 -1.74803993 -1.74803995 2.87E-08
25 -1.10111897 -1.10111900 3.12E-08
26 -0.77578414 -0.77578418 3.45E-08
27 -0.10304078 -0.10304082 3.04E-08
28 -0.08480199 -0.08480202 3.05E-08
29 -0.16094726 -0.16094728 2.65E-08
build/gfortran_BDCD69B59C14BD7C/test/test_dft_dirac_fast 0.40s user 0.01s system 99% cpu 0.404 total
As of e7747e665baa92a90115c2719b91a96de92fac5f on Apple M1 Max and GFortran 11.3.0:
And then benchmark using:
Now apply the following patch:
And