eljost / pysisyphus

Python suite for optimization of stationary points on ground- and excited states PES and determination of reaction paths.
GNU General Public License v3.0
95 stars 33 forks source link

Unexpected multithreading handling for xtb in NEB #297

Closed O2-AC closed 6 months ago

O2-AC commented 6 months ago

Describe the bug Increasing pal leads to unexpected slower calculation cycles.

To Reproduce Running a NEB calculation with pal: 1 gives shorter s/cycle times than running the same calculation with the pal: 8 setting. In both cases the environment variables OMP_NUM_THREADS and MKL_NUM_THREADS are both set to 1 and 8, respectively.

I also just tried it with your Diels-Alder example, when setting the ENV vars both to 8, and also pal: 8, the calculation is very slow. Changing pal to 1, while not touching the ENV variables, speeds up the calculation nearly by an order of magnitude.

Expected behavior The calculation cycles should get faster when increasing either pal or the environment variables.

OS and Python:

Pysisyphus version Current dev branch, installation from source.

eljost commented 6 months ago

Dear Ole,

how big is the system that you calculate (number of atoms per image and number of images)? Are you running calculations in parallel with cluster: true? Did you do a diff of the xtb-log files from, lets say, the first image in the first calculation to see if there is a difference beside the different number of threads?

O2-AC commented 6 months ago

The problem can be reproduced with your Diels-Alder example. System size: 12 images, 16 atoms I have not set cluster: true. Upon closer inspection, this might be a problem which originates from xTB.

Below the shortened outputs of the command: xtb diels_alder_educt.xyz --chrg 0 --uhf 0 --acc 1.0 --gfn 2 --grad with diels_alder_educt.xyz:

16
xyz file from https://github.com/ZimmermanGroup/pyGSM, MIT License
  C     -1.06001665     -1.51714564      0.05288674
  C     -1.82955412     -0.59408623     -0.53968755
  C     -2.01260392      0.79370866     -0.08977969
  C     -1.09740592      1.54095108      0.54110413
  H     -2.40063347     -0.88235561     -1.42321617
  H     -0.51365172     -1.30383154      0.96828855
  H     -0.96688964     -2.52122005     -0.35117707
  H     -1.32088987      2.55628492      0.85667305
  H     -0.09454533      1.17390089      0.74481476
  H     -2.98142355      1.23828062     -0.32070817
  C      3.01841440     -0.33274049      0.53420511
  C      2.48267950      0.16990394     -0.57660955
  H      3.89171154      0.11254122      1.00536203
  H      2.60849064     -1.21591676      1.01902222
  H      1.60806045     -0.27640639     -1.04373753
  H      2.89526366      1.05096138     -1.06301386

1 thread:

           -------------------------------------------------
          |                Calculation Setup                |
           -------------------------------------------------

          program call               : xtb diels_alder_educt.xyz --chrg 0 --uhf 0 --acc 1.0 --gfn 2 --grad
          hostname                   : [...]
          coordinate file            : diels_alder_educt.xyz
          omp threads                :                     1
<...>
           -------------------------------------------------
          | TOTAL ENERGY              -17.821111237146 Eh   |
          | GRADIENT NORM               0.032546379825 Eh/α |
          | HOMO-LUMO GAP               3.849049936451 eV   |
           -------------------------------------------------

------------------------------------------------------------------------
 * finished run on 2024/03/30 at 10:19:09.727
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min,  0.028 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.019 sec
 * ratio c/w:     0.691 speedup
 SCF:
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
 * wall-time:     0 d,  0 h,  0 min,  0.006 sec
 *  cpu-time:     0 d,  0 h,  0 min,  0.006 sec
 * ratio c/w:     0.994 speedup

8 threads:

<...>
 * started run on 2024/03/30 at 10:23:13.686

           -------------------------------------------------
          |                Calculation Setup                |
           -------------------------------------------------

          program call               : xtb diels_alder_educt.xyz --chrg 0 --uhf 0 --acc 1.0 --gfn 2 --grad
          hostname                   : [...]
          coordinate file            : diels_alder_educt.xyz
          omp threads                :                     8
<...>
           -------------------------------------------------
          | TOTAL ENERGY              -17.821111237146 Eh   |
          | GRADIENT NORM               0.032546379512 Eh/α |
          | HOMO-LUMO GAP               3.849049942210 eV   |
           -------------------------------------------------
normal termination of xtb
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL

------------------------------------------------------------------------
 * finished run on 2024/03/30 at 10:23:14.425
------------------------------------------------------------------------
 total:
 * wall-time:     0 d,  0 h,  0 min,  0.740 sec
 *  cpu-time:     0 d,  0 h,  0 min,  5.093 sec
 * ratio c/w:     6.886 speedup
 SCF:
 * wall-time:     0 d,  0 h,  0 min,  0.625 sec
 *  cpu-time:     0 d,  0 h,  0 min,  4.328 sec
 * ratio c/w:     6.921 speedup
O2-AC commented 6 months ago

I am closing this myself. The underlying issue is caused by wrongly compiled versions of xtb of our HPC cluster. Using precompiled xtb versions as provided by the Grimme group, shows the expected behavior.