Green-Phys / green-mbpt

Many-Body Perturbation solvers for Green project
MIT License
5 stars 2 forks source link

A big difference in the found chemical potential with Debug and Release versions #13

Open pavel-po opened 5 months ago

pavel-po commented 5 months ago

Describe the bug Builds with -DCMAKE_BUILD_TYPE=Release and -DCMAKE_BUILD_TYPE=Debug give very different chemical potential and energies.

Expected behavior One would expect mu to be the same in both runs. The difference seems to be too big to be explained by numerical issues (although the number of electrons does not differ much). The computed total energies and correlation energies are very different, which is concerning.

Reproduction steps The two versions that I used are the latest ones cloned from github and compiled as below

cmake ../green-mbpt-default -DCMAKE_BUILD_TYPE=Release \
         -DCMAKE_INSTALL_PREFIX="~/dev/green/default/green-mbpt-default-cpu-installed" 

and

cmake ../green-mbpt-default -DCMAKE_BUILD_TYPE=Debug \
         -DCMAKE_INSTALL_PREFIX="~/dev/green/default/green-mbpt-default-debug-cpu-installed" 

Both builds pass all the tests.

A quick test demonstrating the problem:

export OMP_NUM_THREADS=1
export HDF5_USE_FILE_LOCKING=FALSE
export HDF5_DISABLE_VERSION_CHECK=1

export GREEN_INSTALL=/global/homes/p/pokhilko/dev/green/default/green-mbpt-default-debug-cpu-installed
export GREEN_GRID=$GREEN_INSTALL/share/ir
export INTS=/global/cfs/projectdirs/m3357/pokhilko/ions/Ce3+/x2c_svp/sol1_ints
export INPUT=$INTS

srun -n 64 $GREEN_INSTALL/bin/mbpt.exe --scf_type=GW --BETA 300       \
  --grid_file $GREEN_GRID/1e6.h5 --itermax 1 --results_file Ce.h5 \
  --input_file $INTS/input_green.h5 \
  --jobs SC   \
  --restart false \
  --verbose 4 \
  --tol 1e-14 \
  --E_thr 1e-16 \
  --mixing_type SIGMA_DAMPING --damping 0.5  \
  --dfintegral_hf_file="$INTS/df_hf_int"  \
  --dfintegral_file="$INTS/df_hf_int" \
  --kernel CPU  > Ce_cpu_debug.txt

rm Ce.h5

export GREEN_INSTALL=/global/homes/p/pokhilko/dev/green/default/green-mbpt-default-cpu-installed

srun -n 64 $GREEN_INSTALL/bin/mbpt.exe --scf_type=GW --BETA 300       \
  --grid_file $GREEN_GRID/1e6.h5 --itermax 1 --results_file Ce.h5 \
  --input_file $INTS/input_green.h5 \
  --jobs SC   \
  --restart false \
  --verbose 4 \
  --tol 1e-14 \
  --E_thr 1e-16 \
  --mixing_type SIGMA_DAMPING --damping 0.5  \
  --dfintegral_hf_file="$INTS/df_hf_int"  \
  --dfintegral_file="$INTS/df_hf_int" \
  --kernel CPU  > Ce_cpu.txt

This is an output from sdiff between two runs for the first chemical potential seach:

Inter-node communicator has 1 cores. Intra-node communicator    Inter-node communicator has 1 cores. Intra-node communicator 
nel:1.011669021091587e+02 mu: 0.000000000000000e+00 target ne | nel:1.011669021077246e+02 mu: 0.000000000000000e+00 target ne
nel:7.913057587264623e+01 mu: -5.000000000000000e-01          | nel:7.913057587204560e+01 mu: -5.000000000000000e-01
nel:5.500000000018537e+01 mu: -1.000000000000000e+00          | nel:5.499999999966038e+01 mu: -1.000000000000000e+00
nel:5.400000000040376e+01 mu: -1.500000000000000e+00          | nel:5.500000000148358e+01 mu: -7.500000000000000e-01
nel:5.500000000018537e+01 mu: -1.000000000000000e+00          | nel:5.499999999973080e+01 mu: -1.125000000000000e+00
nel:5.400000000040376e+01 mu: -1.500000000000000e+00          | nel:5.499999999964228e+01 mu: -9.375000000000000e-01
nel:5.500000000030583e+01 mu: -1.250000000000000e+00          | nel:5.499999999968739e+01 mu: -8.437500000000000e-01
nel:5.499999555795911e+01 mu: -1.375000000000000e+00          | nel:5.499999999979431e+01 mu: -7.968750000000000e-01
nel:5.500000000034528e+01 mu: -1.312500000000000e+00          | nel:5.499999999989017e+01 mu: -7.734375000000000e-01
nel:5.499999999999254e+01 mu: -1.343750000000000e+00          | nel:5.499999999999187e+01 mu: -7.617187500000000e-01
nel:5.500000000035316e+01 mu: -1.328125000000000e+00          | nel:5.500000000023394e+01 mu: -7.558593750000000e-01
nel:5.500000000032677e+01 mu: -1.335937500000000e+00          | nel:5.500000000006933e+01 mu: -7.587890625000000e-01
nel:5.500000000024929e+01 mu: -1.339843750000000e+00          | nel:5.500000000002365e+01 mu: -7.602539062500000e-01
nel:5.500000000015798e+01 mu: -1.341796875000000e+00          | nel:5.500000000000638e+01 mu: -7.609863281250000e-01
nel:5.500000000008728e+01 mu: -1.342773437500000e+00          | nel:5.499999999999881e+01 mu: -7.613525390625000e-01
nel:5.500000000004346e+01 mu: -1.343261718750000e+00          | nel:5.500000000000235e+01 mu: -7.611694335937500e-01
nel:5.500000000001898e+01 mu: -1.343505859375000e+00          | nel:5.500000000000078e+01 mu: -7.612609863281250e-01
nel:5.500000000000583e+01 mu: -1.343627929687500e+00          | nel:5.499999999999984e+01 mu: -7.613067626953125e-01
nel:5.499999999999929e+01 mu: -1.343688964843750e+00          |        New chemical potential, μ = -7.613067626953125e-01
nel:5.500000000000252e+01 mu: -1.343658447265625e+00          | Chemical potential difference Δμ =  7.613067626953125e-01
nel:5.500000000000108e+01 mu: -1.343673706054688e+00          | Leakage of Dyson G: 2.54454e-14
nel:5.500000000000034e+01 mu: -1.343681335449219e+00          <
       New chemical potential, μ = -1.343681335449219e+00     <
Chemical potential difference Δμ =  1.343681335449219e+00     <
Leakage of Dyson G: 2.11814e-14                   <

The energies are different too:

Leakage of Dyson G: 2.49562e-14                   | Leakage of Dyson G: 4.61354e-14
                  One-body Energy: -1.196550724102716e+04     |                   One-body Energy: -1.202176729442367e+04
                        HF Energy: -8.823004445052420e+03     |                         HF Energy: -8.851813128243626e+03
               Correlation Energy: -1.405415625467145e+00     |                Correlation Energy: -1.357257979899527e+00
                     Total Energy: -8.824409860677888e+03     |                      Total Energy: -8.853170386223526e+03
   |ΔE_1b| + |ΔE_HF| + |ΔE_corr| =  2.078991710170505e+04     |    |ΔE_1b| + |ΔE_HF| + |ΔE_corr| =  2.087493768064719e+04

Platform (please complete the following information): Perlmutter

pavel-po commented 5 months ago

Serial runs fully reproduce the issue, so it does not seem to be an MPI bug.

egull commented 5 months ago

@pavel-po @iskakoff What's the current status of this?